Message ID | 1bf36b8e08deb3d16fafde3e88ae7cd761e4e7b3.1677377639.git.geoff@infradead.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net/ps3_gelic_net: DMA related fixes | expand |
On Sun, 26 Feb 2023 02:25:42 +0000 Geoff Levand wrote: > + napi_buff = napi_alloc_frag_align(GELIC_NET_MAX_MTU, > + GELIC_NET_RXBUF_ALIGN); You're changing how the buffers are allocated. > + if (unlikely(!napi_buff)) { > + descr->skb = NULL; > + descr->buf_addr = 0; > + descr->buf_size = 0; Wiping the descriptors on failure. > + return -ENOMEM; > + } And generally reshuffling the code. Once again - please don't do any of that in a bug fix. Describe precisely what the problem is and fix that problem, Once the fix is accepted you can send separate patches with other improvements.
From: Jakub Kicinski <kuba@kernel.org> Date: Mon, 27 Feb 2023 18:20:40 -0800 > On Sun, 26 Feb 2023 02:25:42 +0000 Geoff Levand wrote: >> + napi_buff = napi_alloc_frag_align(GELIC_NET_MAX_MTU, >> + GELIC_NET_RXBUF_ALIGN); > > You're changing how the buffers are allocated. > >> + if (unlikely(!napi_buff)) { >> + descr->skb = NULL; >> + descr->buf_addr = 0; >> + descr->buf_size = 0; > > Wiping the descriptors on failure. > >> + return -ENOMEM; >> + } > > And generally reshuffling the code. > > Once again - please don't do any of that in a bug fix. > Describe precisely what the problem is and fix that problem, IIRC the original problem is that the skb linear parts are not always aligned to a boundary which this particular HW requires. So initially there was something like "allocate len + alignment - 1, then PTR_ALIGN()", but I said that it's a waste of memory and we shouldn't do that, using napi_alloc_frag_align() instead. I guess if that would've been described, this could go as a fix? I don't think wasting memory is a good fix, even if we need to change the allocation scheme... > Once the fix is accepted you can send separate patches with > other improvements. Thanks, Olek
From: Geoff Levand <geoff@infradead.org> Date: Sun, 26 Feb 2023 02:25:42 +0000 > The Gelic Ethernet device needs to have the RX sk_buffs aligned to > GELIC_NET_RXBUF_ALIGN and the length of the RX sk_buffs must be a > multiple of GELIC_NET_RXBUF_ALIGN. [...] > static int gelic_descr_prepare_rx(struct gelic_card *card, > struct gelic_descr *descr) > { > - int offset; > - unsigned int bufsize; > + struct device *dev = ctodev(card); > + void *napi_buff; > > if (gelic_descr_get_status(descr) != GELIC_DESCR_DMA_NOT_IN_USE) > dev_info(ctodev(card), "%s: ERROR status\n", __func__); > - /* we need to round up the buffer size to a multiple of 128 */ > - bufsize = ALIGN(GELIC_NET_MAX_MTU, GELIC_NET_RXBUF_ALIGN); > - > - /* and we need to have it 128 byte aligned, therefore we allocate a > - * bit more */ > - descr->skb = dev_alloc_skb(bufsize + GELIC_NET_RXBUF_ALIGN - 1); > - if (!descr->skb) { > - descr->buf_addr = 0; /* tell DMAC don't touch memory */ > - return -ENOMEM; > - } > - descr->buf_size = cpu_to_be32(bufsize); > + > descr->dmac_cmd_status = 0; > descr->result_size = 0; > descr->valid_size = 0; > descr->data_error = 0; > + descr->buf_size = cpu_to_be32(GELIC_NET_MAX_MTU); > + > + napi_buff = napi_alloc_frag_align(GELIC_NET_MAX_MTU, > + GELIC_NET_RXBUF_ALIGN); Must be aligned to the opening brace. > + Redundant newline. > + if (unlikely(!napi_buff)) { > + descr->skb = NULL; > + descr->buf_addr = 0; > + descr->buf_size = 0; > + return -ENOMEM; > + } > + > + descr->skb = napi_build_skb(napi_buff, GELIC_NET_MAX_MTU); You're mixing two, no, three things here. 1. MTU. I.e. max L3+ payload len. It doesn't include Eth header, VLAN and FCS. 2. Max frame size. It is MTU + the abovementioned. Usually it's something like `some power of two - 1`. 3. skb truesize. It is: frame size (i.e. 2) + headroom (usually %NET_SKB_PAD when !XDP, %XDP_PACKET_HEADROOM otherwise, plus %NET_IP_ALIGN) + tailroom (SKB_DATA_ALIGN(sizeof(struct skb_shared_info), see SKB_WITH_OVERHEAD() and adjacent macros). I'm not sure whether your definition is the first or the second... or maybe third? You had 1522, but then changed to 1920? You must pass the third to napi_build_skb(). So you should calculate the truesize first, then allocate a frag and build an skb. Then skb->data will point to the free space with the length of your max frame size. And the truesize calculation might be not just a hardcoded value, but an expression where you add all those head- and tailrooms, so that they will be calculated depending on the platform's configuration. Your current don't reserve any space as headroom, so that frames / HW visible part starts at the very beginning of a frag. It's okay, I mean, there will be reallocations when the stack needs more headroom, but definitely not something to kill the system. You could leave it to an improvement series in the future*. But it either way *must* include tailroom, e.g. SKB_DATA_ALIGN(see_above), otherwise your HW might overwrite kernel structures. * given that the required HW alignment is 128, I'd just allocate 128 bytes more and then do `skb_reserve(skb, RXBUF_HW_ALIGN` right after napi_build_skb() to avoid reallocations. > + This is also redundant. > + if (unlikely(!descr->skb)) { > + skb_free_frag(napi_buff); > + descr->skb = NULL; > + descr->buf_addr = 0; > + descr->buf_size = 0; > + return -ENOMEM; > + } > + > + descr->buf_addr = cpu_to_be32(dma_map_single(dev, napi_buff, > + GELIC_NET_MAX_MTU, DMA_FROM_DEVICE)); > > - offset = ((unsigned long)descr->skb->data) & > - (GELIC_NET_RXBUF_ALIGN - 1); > - if (offset) > - skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset); > - /* io-mmu-map the skb */ > - descr->buf_addr = cpu_to_be32(dma_map_single(ctodev(card), > - descr->skb->data, > - GELIC_NET_MAX_MTU, > - DMA_FROM_DEVICE)); > if (!descr->buf_addr) { > - dev_kfree_skb_any(descr->skb); > + skb_free_frag(napi_buff); > descr->skb = NULL; > - dev_info(ctodev(card), > - "%s:Could not iommu-map rx buffer\n", __func__); > + descr->buf_addr = 0; > + descr->buf_size = 0; > + dev_err_once(dev, "%s:Could not iommu-map rx buffer\n", > + __func__); > gelic_descr_set_status(descr, GELIC_DESCR_DMA_NOT_IN_USE); > return -ENOMEM; > - } else { > - gelic_descr_set_status(descr, GELIC_DESCR_DMA_CARDOWNED); > - return 0; > } > + > + gelic_descr_set_status(descr, GELIC_DESCR_DMA_CARDOWNED); > + return 0; An empty newline is preferred before any `return`. > } > > /** > diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.h b/drivers/net/ethernet/toshiba/ps3_gelic_net.h > index 68f324ed4eaf..d3868eb7234c 100644 > --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.h > +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.h > @@ -19,7 +19,7 @@ > #define GELIC_NET_RX_DESCRIPTORS 128 /* num of descriptors */ > #define GELIC_NET_TX_DESCRIPTORS 128 /* num of descriptors */ > > -#define GELIC_NET_MAX_MTU VLAN_ETH_FRAME_LEN > +#define GELIC_NET_MAX_MTU 1920 > #define GELIC_NET_MIN_MTU VLAN_ETH_ZLEN > #define GELIC_NET_RXBUF_ALIGN 128 > #define GELIC_CARD_RX_CSUM_DEFAULT 1 /* hw chksum */ Thanks, Olek
On Tue, 28 Feb 2023 16:47:25 +0100 Alexander Lobakin wrote: > >> + return -ENOMEM; > >> + } > > > > And generally reshuffling the code. > > > > Once again - please don't do any of that in a bug fix. > > Describe precisely what the problem is and fix that problem, > > IIRC the original problem is that the skb linear parts are not always > aligned to a boundary which this particular HW requires. So initially > there was something like "allocate len + alignment - 1, then > PTR_ALIGN()", Let's focus on where the bug is. At a quick look I'm guessing that the bug is that we align the CPU buffer instead of the DMA buffer. We should pass the entire allocate len + alignment - 1 as length to dma_map() and then align the dma_addr. dma_addr is what the device sees. Not the virtual address of skb->data. If I'm right the bug is not in fact directly addressed by the patch. I'm guessing the patch helps because at least the patch passes the aligned length to the dma_map(), rather than GELIC_NET_MAX_MTU (which is not a multiple of 128). > but I said that it's a waste of memory and we shouldn't do > that, using napi_alloc_frag_align() instead. > I guess if that would've been described, this could go as a fix? I don't > think wasting memory is a good fix, even if we need to change the > allocation scheme... In general doing such a rewrite may be fine, if the author is an expert and the commit message is very precise. Here we are at v6 already and IMHO the problem is not even well understood. Let's focus on understanding the problem and writing a _minimal_ fix. The waste of memory claim can't be taken seriously when the MTU define is bumped by 500 with no mention in the commit msg, as you also noticed. Minimal fix, please.
Hi, On 2/28/23 12:31, Jakub Kicinski wrote: >> >> IIRC the original problem is that the skb linear parts are not always >> aligned to a boundary which this particular HW requires. So initially >> there was something like "allocate len + alignment - 1, then >> PTR_ALIGN()", > > Let's focus on where the bug is. At a quick look I'm guessing that > the bug is that we align the CPU buffer instead of the DMA buffer. > We should pass the entire allocate len + alignment - 1 as length > to dma_map() and then align the dma_addr. dma_addr is what the device > sees. Not the virtual address of skb->data. > > If I'm right the bug is not in fact directly addressed by the patch. > I'm guessing the patch helps because at least the patch passes the > aligned length to the dma_map(), rather than GELIC_NET_MAX_MTU (which > is not a multiple of 128). I found some old notes for the gelic network device. It had values for the max frame size, and the max MTU size. These values are the same as what is in the 'spider net' driver. For patch set v7 I just took what the spider net driver was doing for its RX DMA buffer allocation and applied that to the gelic driver. I think your comment about aligning the DMA buffer is addressed by the lines: offset = ((unsigned long)descr->skb->data) & (GELIC_NET_RXBUF_ALIGN - 1); if (offset) skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset); I tried to do some thorough testing of v7, and I couldn't get it to error. -Geoff
Hi, I've gone back to setting up the napi routines. On 2/28/23 08:12, Alexander Lobakin wrote: > From: Geoff Levand <geoff@infradead.org> > Date: Sun, 26 Feb 2023 02:25:42 +0000 > >> The Gelic Ethernet device needs to have the RX sk_buffs aligned to >> GELIC_NET_RXBUF_ALIGN and the length of the RX sk_buffs must be a >> multiple of GELIC_NET_RXBUF_ALIGN. >> + >> + napi_buff = napi_alloc_frag_align(GELIC_NET_MAX_MTU, >> + GELIC_NET_RXBUF_ALIGN); >> + >> + descr->skb = napi_build_skb(napi_buff, GELIC_NET_MAX_MTU); > > You're mixing two, no, three things here. > > 1. MTU. I.e. max L3+ payload len. It doesn't include Eth header, VLAN > and FCS. > 2. Max frame size. It is MTU + the abovementioned. Usually it's > something like `some power of two - 1`. > 3. skb truesize. > It is: frame size (i.e. 2) + headroom (usually %NET_SKB_PAD when > !XDP, %XDP_PACKET_HEADROOM otherwise, plus %NET_IP_ALIGN) + tailroom > (SKB_DATA_ALIGN(sizeof(struct skb_shared_info), see > SKB_WITH_OVERHEAD() and adjacent macros). > > I'm not sure whether your definition is the first or the second... or > maybe third? You had 1522, but then changed to 1920? You must pass the > third to napi_build_skb(). > So you should calculate the truesize first, then allocate a frag and > build an skb. Then skb->data will point to the free space with the > length of your max frame size. > And the truesize calculation might be not just a hardcoded value, but an > expression where you add all those head- and tailrooms, so that they > will be calculated depending on the platform's configuration. > > Your current don't reserve any space as headroom, so that frames / HW > visible part starts at the very beginning of a frag. It's okay, I mean, > there will be reallocations when the stack needs more headroom, but > definitely not something to kill the system. You could leave it to an > improvement series in the future*. > But it either way *must* include tailroom, e.g. > SKB_DATA_ALIGN(see_above), otherwise your HW might overwrite kernel > structures. > > * given that the required HW alignment is 128, I'd just allocate 128 > bytes more and then do `skb_reserve(skb, RXBUF_HW_ALIGN` right after > napi_build_skb() to avoid reallocations. Looking at the docs for the PS3's gelic network device I found that the DMA buffer it uses has a fixed layout: VLAN Data 2 bytes Dest MAC 6 bytes Source MAC 6 bytes Type/Length 2 bytes DATA 46-2294 bytes So, the max DMA buffer size is 2310, which I guess is the same as the MAX_FRAME size, which is given as 2312. That's about 18.05*128. So if the napi_buff size is 19*128 = 2432 and the start aligned to 128, that should give me what I need: #define GELIC_NET_RXBUF_ALIGN 128 static const unsigned int napi_buff_size = 19 * GELIC_NET_RXBUF_ALIGN; napi_buff = napi_alloc_frag_align(napi_buff_size, GELIC_NET_RXBUF_ALIGN); descr->skb = napi_build_skb(napi_buff, napi_buff_size); cpu_addr = dma_map_single(dev, napi_buff, napi_buff_size, DMA_FROM_DEVICE); You can find the actual patch here: https://git.kernel.org/pub/scm/linux/kernel/git/geoff/ps3-linux.git/commit/?h=ps3-queue-v6.3--gelic-work&id=629de5a5d2875354c5d48fca7f5c1d24f4bf3a8e I did some rigorous testing with this and didn't have any problems. -Geoff
diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.c b/drivers/net/ethernet/toshiba/ps3_gelic_net.c index cf8de8a7a8a1..7e12e041acd3 100644 --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.c +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.c @@ -365,51 +365,61 @@ static int gelic_card_init_chain(struct gelic_card *card, * * allocates a new rx skb, iommu-maps it and attaches it to the descriptor. * Activate the descriptor state-wise + * + * Gelic RX sk_buffs must be aligned to GELIC_NET_RXBUF_ALIGN and the length + * must be a multiple of GELIC_NET_RXBUF_ALIGN. */ static int gelic_descr_prepare_rx(struct gelic_card *card, struct gelic_descr *descr) { - int offset; - unsigned int bufsize; + struct device *dev = ctodev(card); + void *napi_buff; if (gelic_descr_get_status(descr) != GELIC_DESCR_DMA_NOT_IN_USE) dev_info(ctodev(card), "%s: ERROR status\n", __func__); - /* we need to round up the buffer size to a multiple of 128 */ - bufsize = ALIGN(GELIC_NET_MAX_MTU, GELIC_NET_RXBUF_ALIGN); - - /* and we need to have it 128 byte aligned, therefore we allocate a - * bit more */ - descr->skb = dev_alloc_skb(bufsize + GELIC_NET_RXBUF_ALIGN - 1); - if (!descr->skb) { - descr->buf_addr = 0; /* tell DMAC don't touch memory */ - return -ENOMEM; - } - descr->buf_size = cpu_to_be32(bufsize); + descr->dmac_cmd_status = 0; descr->result_size = 0; descr->valid_size = 0; descr->data_error = 0; + descr->buf_size = cpu_to_be32(GELIC_NET_MAX_MTU); + + napi_buff = napi_alloc_frag_align(GELIC_NET_MAX_MTU, + GELIC_NET_RXBUF_ALIGN); + + if (unlikely(!napi_buff)) { + descr->skb = NULL; + descr->buf_addr = 0; + descr->buf_size = 0; + return -ENOMEM; + } + + descr->skb = napi_build_skb(napi_buff, GELIC_NET_MAX_MTU); + + if (unlikely(!descr->skb)) { + skb_free_frag(napi_buff); + descr->skb = NULL; + descr->buf_addr = 0; + descr->buf_size = 0; + return -ENOMEM; + } + + descr->buf_addr = cpu_to_be32(dma_map_single(dev, napi_buff, + GELIC_NET_MAX_MTU, DMA_FROM_DEVICE)); - offset = ((unsigned long)descr->skb->data) & - (GELIC_NET_RXBUF_ALIGN - 1); - if (offset) - skb_reserve(descr->skb, GELIC_NET_RXBUF_ALIGN - offset); - /* io-mmu-map the skb */ - descr->buf_addr = cpu_to_be32(dma_map_single(ctodev(card), - descr->skb->data, - GELIC_NET_MAX_MTU, - DMA_FROM_DEVICE)); if (!descr->buf_addr) { - dev_kfree_skb_any(descr->skb); + skb_free_frag(napi_buff); descr->skb = NULL; - dev_info(ctodev(card), - "%s:Could not iommu-map rx buffer\n", __func__); + descr->buf_addr = 0; + descr->buf_size = 0; + dev_err_once(dev, "%s:Could not iommu-map rx buffer\n", + __func__); gelic_descr_set_status(descr, GELIC_DESCR_DMA_NOT_IN_USE); return -ENOMEM; - } else { - gelic_descr_set_status(descr, GELIC_DESCR_DMA_CARDOWNED); - return 0; } + + gelic_descr_set_status(descr, GELIC_DESCR_DMA_CARDOWNED); + return 0; } /** diff --git a/drivers/net/ethernet/toshiba/ps3_gelic_net.h b/drivers/net/ethernet/toshiba/ps3_gelic_net.h index 68f324ed4eaf..d3868eb7234c 100644 --- a/drivers/net/ethernet/toshiba/ps3_gelic_net.h +++ b/drivers/net/ethernet/toshiba/ps3_gelic_net.h @@ -19,7 +19,7 @@ #define GELIC_NET_RX_DESCRIPTORS 128 /* num of descriptors */ #define GELIC_NET_TX_DESCRIPTORS 128 /* num of descriptors */ -#define GELIC_NET_MAX_MTU VLAN_ETH_FRAME_LEN +#define GELIC_NET_MAX_MTU 1920 #define GELIC_NET_MIN_MTU VLAN_ETH_ZLEN #define GELIC_NET_RXBUF_ALIGN 128 #define GELIC_CARD_RX_CSUM_DEFAULT 1 /* hw chksum */
The Gelic Ethernet device needs to have the RX sk_buffs aligned to GELIC_NET_RXBUF_ALIGN and the length of the RX sk_buffs must be a multiple of GELIC_NET_RXBUF_ALIGN. The current Gelic Ethernet driver was not allocating sk_buffs large enough to allow for this alignment. Fixes various randomly occurring runtime network errors. Fixes: 02c1889166b4 ("ps3: gigabit ethernet driver for PS3, take3") Signed-off-by: Geoff Levand <geoff@infradead.org> --- drivers/net/ethernet/toshiba/ps3_gelic_net.c | 66 +++++++++++--------- drivers/net/ethernet/toshiba/ps3_gelic_net.h | 2 +- 2 files changed, 39 insertions(+), 29 deletions(-)