Message ID | 20220330163703.25086-1-ap420073@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 059a47f1da93811d37533556d67e72f2261b1127 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] net: sfc: add missing xdp queue reinitialization | expand |
Hello: This patch was applied to netdev/net.git (master) by David S. Miller <davem@davemloft.net>: On Wed, 30 Mar 2022 16:37:03 +0000 you wrote: > After rx/tx ring buffer size is changed, kernel panic occurs when > it acts XDP_TX or XDP_REDIRECT. > > When tx/rx ring buffer size is changed(ethtool -G), sfc driver > reallocates and reinitializes rx and tx queues and their buffer > (tx_queue->buffer). > But it misses reinitializing xdp queues(efx->xdp_tx_queues). > So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized > tx_queue->buffer. > > [...] Here is the summary with links: - [net,v2] net: sfc: add missing xdp queue reinitialization https://git.kernel.org/netdev/net/c/059a47f1da93 You are awesome, thank you!
Hi Taehee, Thanks for looking into this. Unfortunately efx_realloc_channels() has turned out to be quite fragile over the years, so I'm keen to remove it in stead of patching it up all the time. Could you try the patch below please? If it works ok for you as well we'll be able to remove efx_realloc_channels(). The added advantage of this approach is that the netdev notifiers get informed of the change. Regards, Martin Habets <habetsm.xilinx@gmail.com> --- drivers/net/ethernet/sfc/ethtool.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c index 48506373721a..8cfbe61737bb 100644 --- a/drivers/net/ethernet/sfc/ethtool.c +++ b/drivers/net/ethernet/sfc/ethtool.c @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, { struct efx_nic *efx = netdev_priv(net_dev); u32 txq_entries; + int rc = 0; if (ring->rx_mini_pending || ring->rx_jumbo_pending || ring->rx_pending > EFX_MAX_DMAQ_SIZE || @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, "increasing TX queue size to minimum of %u\n", txq_entries); - return efx_realloc_channels(efx, ring->rx_pending, txq_entries); + /* Apply the new settings */ + efx->rxq_entries = ring->rx_pending; + efx->txq_entries = ring->tx_pending; + + /* Update the datapath with the new settings if the interface is up */ + if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) { + dev_close(net_dev); + rc = dev_open(net_dev, NULL); + } + + return rc; } static void efx_ethtool_get_wol(struct net_device *net_dev,
On 4/1/22 20:06, Martin Habets wrote: Hi Martin, Thank you so much for your review! > Hi Taehee, > > Thanks for looking into this. Unfortunately efx_realloc_channels() > has turned out to be quite fragile over the years, so I'm > keen to remove it in stead of patching it up all the time. I agree with you. efx_realloc_channels() is too complex. > > Could you try the patch below please? > If it works ok for you as well we'll be able to remove > efx_realloc_channels(). The added advantage of this approach > is that the netdev notifiers get informed of the change. I tested your patch and I found a page reference count problem. How to test: 1. set up XDP_TX 2. traffic on 3. traffic off 4. ring buffer size change 5. loop from 2 to 4. [ 87.836195][ T72] BUG: Bad page state in process kworker/u16:1 pfn:125445 [ 87.843356][ T72] page:000000003725f642 refcount:-2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x125445 [ 87.853783][ T72] flags: 0x200000000000000(node=0|zone=2) [ 87.859391][ T72] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000 [ 87.867928][ T72] raw: 0000000000000000 0000000000000000 fffffffeffffffff 0000000000000000 [ 87.876569][ T72] page dumped because: nonzero _refcount [ 87.882125][ T72] Modules linked in: af_packet sfc ixgbe mtd atlantic coretemp mdio hwmon sch_fq_codel msr bpf_prelx [ 87.895331][ T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted 5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833 [ 87.906350][ T72] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 [ 87.915360][ T72] Workqueue: netns cleanup_net [ 87.920087][ T72] Call Trace: [ 87.923311][ T72] <TASK> [ 87.926188][ T72] dump_stack_lvl+0x56/0x7b [ 87.930597][ T72] bad_page.cold.125+0x63/0x93 [ 87.935288][ T72] free_pcppages_bulk+0x63c/0x6f0 [ 87.940232][ T72] free_unref_page+0x8b/0xf0 [ 87.944749][ T72] efx_fini_rx_queue+0x15f/0x210 [sfc 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] [ 87.953756][ T72] efx_stop_channels+0xef/0x1b0 [sfc 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] [ 87.962699][ T72] efx_net_stop+0x4d/0x60 [sfc 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] [ 87.971029][ T72] __dev_close_many+0x8b/0xf0 [ 87.975618][ T72] dev_close_many+0x7d/0x120 [ ... ] In addition, I would like to share issues that I'm currently looking into: 1. TX DMA error when interface down/up or ring buffer size changes, TX DMA error would occur because tx_queue can be used before initialization. But It will be fixed by the below patch. static void efx_ethtool_get_wol(struct net_device *net_dev, diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c index d16e031e95f4..6983799e1c05 100644 --- a/drivers/net/ethernet/sfc/tx.c +++ b/drivers/net/ethernet/sfc/tx.c @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs, if (unlikely(!tx_queue)) return -EINVAL; + if (!tx_queue->initialised) + return -EINVAL; + if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED) HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu); diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c index d530cde2b864..9bc8281b7f5b 100644 --- a/drivers/net/ethernet/sfc/tx_common.c +++ b/drivers/net/ethernet/sfc/tx_common.c @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue) netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev, "shutting down TX queue %d\n", tx_queue->queue); + tx_queue->initialised = false; + if (!tx_queue->buffer) return; After your patch, unfortunately, it can't fix ring buffer size change case. It can fix only interface down/up case. I will look into this more. 2. Memory leak There is a memory leak in ring buffer size change logic. reproducer: while : do ethtool -G <interface name> rx 2048 tx 2048 ethtool -G <interface name> rx 1024 tx 1024 done Thanks a lot, Taehee Yoo > > Regards, > Martin Habets <habetsm.xilinx@gmail.com> > > --- > drivers/net/ethernet/sfc/ethtool.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c > index 48506373721a..8cfbe61737bb 100644 > --- a/drivers/net/ethernet/sfc/ethtool.c > +++ b/drivers/net/ethernet/sfc/ethtool.c > @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, > { > struct efx_nic *efx = netdev_priv(net_dev); > u32 txq_entries; > + int rc = 0; > > if (ring->rx_mini_pending || ring->rx_jumbo_pending || > ring->rx_pending > EFX_MAX_DMAQ_SIZE || > @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, > "increasing TX queue size to minimum of %u\n", > txq_entries); > > - return efx_realloc_channels(efx, ring->rx_pending, txq_entries); > + /* Apply the new settings */ > + efx->rxq_entries = ring->rx_pending; > + efx->txq_entries = ring->tx_pending; > + > + /* Update the datapath with the new settings if the interface is up */ > + if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) { > + dev_close(net_dev); > + rc = dev_open(net_dev, NULL); > + } > + > + return rc; > } > > static void efx_ethtool_get_wol(struct net_device *net_dev,
Hi Taehee, On Sat, Apr 02, 2022 at 03:48:14AM +0900, Taehee Yoo wrote: > On 4/1/22 20:06, Martin Habets wrote: > > Hi Martin, > Thank you so much for your review! > > > Hi Taehee, > > > > Thanks for looking into this. Unfortunately efx_realloc_channels() > > has turned out to be quite fragile over the years, so I'm > > keen to remove it in stead of patching it up all the time. > > I agree with you. > efx_realloc_channels() is too complex. > > > > > Could you try the patch below please? > > If it works ok for you as well we'll be able to remove > > efx_realloc_channels(). The added advantage of this approach > > is that the netdev notifiers get informed of the change. > > I tested your patch and I found a page reference count problem. > How to test: > 1. set up XDP_TX > 2. traffic on > 3. traffic off > 4. ring buffer size change > 5. loop from 2 to 4. > > [ 87.836195][ T72] BUG: Bad page state in process kworker/u16:1 > pfn:125445 > [ 87.843356][ T72] page:000000003725f642 refcount:-2 mapcount:0 > mapping:0000000000000000 index:0x0 pfn:0x125445 > [ 87.853783][ T72] flags: 0x200000000000000(node=0|zone=2) > > [ 87.859391][ T72] raw: 0200000000000000 dead000000000100 > dead000000000122 0000000000000000 > [ 87.867928][ T72] raw: 0000000000000000 0000000000000000 > fffffffeffffffff 0000000000000000 > [ 87.876569][ T72] page dumped because: nonzero _refcount > > [ 87.882125][ T72] Modules linked in: af_packet sfc ixgbe mtd atlantic > coretemp mdio hwmon sch_fq_codel msr bpf_prelx > [ 87.895331][ T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted > 5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833 > [ 87.906350][ T72] Hardware name: ASUS System Product Name/PRIME Z690-P > D4, BIOS 0603 11/01/2021 > [ 87.915360][ T72] Workqueue: netns cleanup_net > > [ 87.920087][ T72] Call Trace: > > [ 87.923311][ T72] <TASK> > > [ 87.926188][ T72] dump_stack_lvl+0x56/0x7b > > [ 87.930597][ T72] bad_page.cold.125+0x63/0x93 > > [ 87.935288][ T72] free_pcppages_bulk+0x63c/0x6f0 > > [ 87.940232][ T72] free_unref_page+0x8b/0xf0 > > [ 87.944749][ T72] efx_fini_rx_queue+0x15f/0x210 [sfc > 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] Looks to me like this is in efx_fini_rx_recycle_ring(). It could be a side effect of the memory leak you report below. If this is in efx_fini_rx_recycle_ring() I'll post a patch for that soon on a separate thread. > [ 87.953756][ T72] efx_stop_channels+0xef/0x1b0 [sfc > 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] > [ 87.962699][ T72] efx_net_stop+0x4d/0x60 [sfc > 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] > [ 87.971029][ T72] __dev_close_many+0x8b/0xf0 > > [ 87.975618][ T72] dev_close_many+0x7d/0x120 > > [ ... ] > > > In addition, I would like to share issues that I'm currently looking into: > 1. TX DMA error > when interface down/up or ring buffer size changes, TX DMA error would occur > because tx_queue can be used before initialization. > But It will be fixed by the below patch. > > static void efx_ethtool_get_wol(struct net_device *net_dev, > diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c > index d16e031e95f4..6983799e1c05 100644 > --- a/drivers/net/ethernet/sfc/tx.c > +++ b/drivers/net/ethernet/sfc/tx.c > @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, > struct xdp_frame **xdpfs, > if (unlikely(!tx_queue)) > return -EINVAL; > > + if (!tx_queue->initialised) > + return -EINVAL; > + > if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED) > HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu); > > diff --git a/drivers/net/ethernet/sfc/tx_common.c > b/drivers/net/ethernet/sfc/tx_common.c > index d530cde2b864..9bc8281b7f5b 100644 > --- a/drivers/net/ethernet/sfc/tx_common.c > +++ b/drivers/net/ethernet/sfc/tx_common.c > @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue) > netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev, > "shutting down TX queue %d\n", tx_queue->queue); > > + tx_queue->initialised = false; > + > if (!tx_queue->buffer) > return; Looks ok, but xmit_hard should never be called on an interface that is down. Makes me wonder if we have a seqence issue in our ndo_stop API. > > After your patch, unfortunately, it can't fix ring buffer size change case. > It can fix only interface down/up case. > I will look into this more. > > 2. Memory leak > There is a memory leak in ring buffer size change logic. > reproducer: > while : > do > ethtool -G <interface name> rx 2048 tx 2048 > ethtool -G <interface name> rx 1024 tx 1024 > done Is this with my patch or only with yours? Thanks a lot for testing this. Martin > Thanks a lot, > Taehee Yoo > > > > > Regards, > > Martin Habets <habetsm.xilinx@gmail.com> > > > > --- > > drivers/net/ethernet/sfc/ethtool.c | 13 ++++++++++++- > > 1 file changed, 12 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/sfc/ethtool.c > b/drivers/net/ethernet/sfc/ethtool.c > > index 48506373721a..8cfbe61737bb 100644 > > --- a/drivers/net/ethernet/sfc/ethtool.c > > +++ b/drivers/net/ethernet/sfc/ethtool.c > > @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, > > { > > struct efx_nic *efx = netdev_priv(net_dev); > > u32 txq_entries; > > + int rc = 0; > > > > if (ring->rx_mini_pending || ring->rx_jumbo_pending || > > ring->rx_pending > EFX_MAX_DMAQ_SIZE || > > @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, > > "increasing TX queue size to minimum of %u\n", > > txq_entries); > > > > - return efx_realloc_channels(efx, ring->rx_pending, txq_entries); > > + /* Apply the new settings */ > > + efx->rxq_entries = ring->rx_pending; > > + efx->txq_entries = ring->tx_pending; > > + > > + /* Update the datapath with the new settings if the interface is up */ > > + if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) { > > + dev_close(net_dev); > > + rc = dev_open(net_dev, NULL); > > + } > > + > > + return rc; > > } > > > > static void efx_ethtool_get_wol(struct net_device *net_dev,
On 4/4/22 19:36, Martin Habets wrote: Hi Martin, > Hi Taehee, > > On Sat, Apr 02, 2022 at 03:48:14AM +0900, Taehee Yoo wrote: >> On 4/1/22 20:06, Martin Habets wrote: >> >> Hi Martin, >> Thank you so much for your review! >> >>> Hi Taehee, >>> >>> Thanks for looking into this. Unfortunately efx_realloc_channels() >>> has turned out to be quite fragile over the years, so I'm >>> keen to remove it in stead of patching it up all the time. >> >> I agree with you. >> efx_realloc_channels() is too complex. >> >>> >>> Could you try the patch below please? >>> If it works ok for you as well we'll be able to remove >>> efx_realloc_channels(). The added advantage of this approach >>> is that the netdev notifiers get informed of the change. >> >> I tested your patch and I found a page reference count problem. >> How to test: >> 1. set up XDP_TX >> 2. traffic on >> 3. traffic off >> 4. ring buffer size change >> 5. loop from 2 to 4. >> >> [ 87.836195][ T72] BUG: Bad page state in process kworker/u16:1 >> pfn:125445 >> [ 87.843356][ T72] page:000000003725f642 refcount:-2 mapcount:0 >> mapping:0000000000000000 index:0x0 pfn:0x125445 >> [ 87.853783][ T72] flags: 0x200000000000000(node=0|zone=2) >> >> [ 87.859391][ T72] raw: 0200000000000000 dead000000000100 >> dead000000000122 0000000000000000 >> [ 87.867928][ T72] raw: 0000000000000000 0000000000000000 >> fffffffeffffffff 0000000000000000 >> [ 87.876569][ T72] page dumped because: nonzero _refcount >> >> [ 87.882125][ T72] Modules linked in: af_packet sfc ixgbe mtd atlantic >> coretemp mdio hwmon sch_fq_codel msr bpf_prelx >> [ 87.895331][ T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted >> 5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833 >> [ 87.906350][ T72] Hardware name: ASUS System Product Name/PRIME Z690-P >> D4, BIOS 0603 11/01/2021 >> [ 87.915360][ T72] Workqueue: netns cleanup_net >> >> [ 87.920087][ T72] Call Trace: >> >> [ 87.923311][ T72] <TASK> >> >> [ 87.926188][ T72] dump_stack_lvl+0x56/0x7b >> >> [ 87.930597][ T72] bad_page.cold.125+0x63/0x93 >> >> [ 87.935288][ T72] free_pcppages_bulk+0x63c/0x6f0 >> >> [ 87.940232][ T72] free_unref_page+0x8b/0xf0 >> >> [ 87.944749][ T72] efx_fini_rx_queue+0x15f/0x210 [sfc >> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] > > Looks to me like this is in efx_fini_rx_recycle_ring(). > It could be a side effect of the memory leak you report below. > If this is in efx_fini_rx_recycle_ring() I'll post a patch for > that soon on a separate thread. Thanks for that! > >> [ 87.953756][ T72] efx_stop_channels+0xef/0x1b0 [sfc >> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] >> [ 87.962699][ T72] efx_net_stop+0x4d/0x60 [sfc >> 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] >> [ 87.971029][ T72] __dev_close_many+0x8b/0xf0 >> >> [ 87.975618][ T72] dev_close_many+0x7d/0x120 >> >> [ ... ] >> >> >> In addition, I would like to share issues that I'm currently looking into: >> 1. TX DMA error >> when interface down/up or ring buffer size changes, TX DMA error would occur >> because tx_queue can be used before initialization. >> But It will be fixed by the below patch. >> >> static void efx_ethtool_get_wol(struct net_device *net_dev, >> diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c >> index d16e031e95f4..6983799e1c05 100644 >> --- a/drivers/net/ethernet/sfc/tx.c >> +++ b/drivers/net/ethernet/sfc/tx.c >> @@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, >> struct xdp_frame **xdpfs, >> if (unlikely(!tx_queue)) >> return -EINVAL; >> >> + if (!tx_queue->initialised) >> + return -EINVAL; >> + >> if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED) >> HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu); >> >> diff --git a/drivers/net/ethernet/sfc/tx_common.c >> b/drivers/net/ethernet/sfc/tx_common.c >> index d530cde2b864..9bc8281b7f5b 100644 >> --- a/drivers/net/ethernet/sfc/tx_common.c >> +++ b/drivers/net/ethernet/sfc/tx_common.c >> @@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue) >> netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev, >> "shutting down TX queue %d\n", tx_queue->queue); >> >> + tx_queue->initialised = false; >> + >> if (!tx_queue->buffer) >> return; > > Looks ok, but xmit_hard should never be called on an interface that > is down. Makes me wonder if we have a seqence issue in our ndo_stop API. > I think it's an XDP tx_queue specific issue. Sorry that I didn't provide that information. When a packet is received and it acts XDP_TX, it calls efx_xdp_tx_buffers() and it uses xdp tx_queue and sends it to hardware directly. So, ->ndo_start_xmit is not called. And that bug occurs when the first interface up. ->ndo_stop has never called, So I think this is not an issue of ->ndo_stop. >> >> After your patch, unfortunately, it can't fix ring buffer size change case. >> It can fix only interface down/up case. >> I will look into this more. >> >> 2. Memory leak >> There is a memory leak in ring buffer size change logic. >> reproducer: >> while : >> do >> ethtool -G <interface name> rx 2048 tx 2048 >> ethtool -G <interface name> rx 1024 tx 1024 >> done > > Is this with my patch or only with yours? > Thanks a lot for testing this. > Memory leak still occurs with your patch(do not use efx_realloc_channels()) Thanks a lot, Taehee Yoo > Martin > >> Thanks a lot, >> Taehee Yoo >> >>> >>> Regards, >>> Martin Habets <habetsm.xilinx@gmail.com> >>> >>> --- >>> drivers/net/ethernet/sfc/ethtool.c | 13 ++++++++++++- >>> 1 file changed, 12 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/net/ethernet/sfc/ethtool.c >> b/drivers/net/ethernet/sfc/ethtool.c >>> index 48506373721a..8cfbe61737bb 100644 >>> --- a/drivers/net/ethernet/sfc/ethtool.c >>> +++ b/drivers/net/ethernet/sfc/ethtool.c >>> @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, >>> { >>> struct efx_nic *efx = netdev_priv(net_dev); >>> u32 txq_entries; >>> + int rc = 0; >>> >>> if (ring->rx_mini_pending || ring->rx_jumbo_pending || >>> ring->rx_pending > EFX_MAX_DMAQ_SIZE || >>> @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev, >>> "increasing TX queue size to minimum of %u\n", >>> txq_entries); >>> >>> - return efx_realloc_channels(efx, ring->rx_pending, txq_entries); >>> + /* Apply the new settings */ >>> + efx->rxq_entries = ring->rx_pending; >>> + efx->txq_entries = ring->tx_pending; >>> + >>> + /* Update the datapath with the new settings if the interface is up */ >>> + if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) { >>> + dev_close(net_dev); >>> + rc = dev_open(net_dev, NULL); >>> + } >>> + >>> + return rc; >>> } >>> >>> static void efx_ethtool_get_wol(struct net_device *net_dev, >
diff --git a/drivers/net/ethernet/sfc/efx_channels.c b/drivers/net/ethernet/sfc/efx_channels.c index d6fdcdc530ca..b29ca69aeca5 100644 --- a/drivers/net/ethernet/sfc/efx_channels.c +++ b/drivers/net/ethernet/sfc/efx_channels.c @@ -789,6 +789,85 @@ void efx_remove_channels(struct efx_nic *efx) kfree(efx->xdp_tx_queues); } +static int efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number, + struct efx_tx_queue *tx_queue) +{ + if (xdp_queue_number >= efx->xdp_tx_queue_count) + return -EINVAL; + + netif_dbg(efx, drv, efx->net_dev, + "Channel %u TXQ %u is XDP %u, HW %u\n", + tx_queue->channel->channel, tx_queue->label, + xdp_queue_number, tx_queue->queue); + efx->xdp_tx_queues[xdp_queue_number] = tx_queue; + return 0; +} + +static void efx_set_xdp_channels(struct efx_nic *efx) +{ + struct efx_tx_queue *tx_queue; + struct efx_channel *channel; + unsigned int next_queue = 0; + int xdp_queue_number = 0; + int rc; + + /* We need to mark which channels really have RX and TX + * queues, and adjust the TX queue numbers if we have separate + * RX-only and TX-only channels. + */ + efx_for_each_channel(channel, efx) { + if (channel->channel < efx->tx_channel_offset) + continue; + + if (efx_channel_is_xdp_tx(channel)) { + efx_for_each_channel_tx_queue(tx_queue, channel) { + tx_queue->queue = next_queue++; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, + tx_queue); + if (rc == 0) + xdp_queue_number++; + } + } else { + efx_for_each_channel_tx_queue(tx_queue, channel) { + tx_queue->queue = next_queue++; + netif_dbg(efx, drv, efx->net_dev, + "Channel %u TXQ %u is HW %u\n", + channel->channel, tx_queue->label, + tx_queue->queue); + } + + /* If XDP is borrowing queues from net stack, it must + * use the queue with no csum offload, which is the + * first one of the channel + * (note: tx_queue_by_type is not initialized yet) + */ + if (efx->xdp_txq_queues_mode == + EFX_XDP_TX_QUEUES_BORROWED) { + tx_queue = &channel->tx_queue[0]; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, + tx_queue); + if (rc == 0) + xdp_queue_number++; + } + } + } + WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED && + xdp_queue_number != efx->xdp_tx_queue_count); + WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED && + xdp_queue_number > efx->xdp_tx_queue_count); + + /* If we have more CPUs than assigned XDP TX queues, assign the already + * existing queues to the exceeding CPUs + */ + next_queue = 0; + while (xdp_queue_number < efx->xdp_tx_queue_count) { + tx_queue = efx->xdp_tx_queues[next_queue++]; + rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); + if (rc == 0) + xdp_queue_number++; + } +} + int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) { struct efx_channel *other_channel[EFX_MAX_CHANNELS], *channel; @@ -860,6 +939,7 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) efx_init_napi_channel(efx->channel[i]); } + efx_set_xdp_channels(efx); out: /* Destroy unused channel structures */ for (i = 0; i < efx->n_channels; i++) { @@ -892,26 +972,9 @@ int efx_realloc_channels(struct efx_nic *efx, u32 rxq_entries, u32 txq_entries) goto out; } -static inline int -efx_set_xdp_tx_queue(struct efx_nic *efx, int xdp_queue_number, - struct efx_tx_queue *tx_queue) -{ - if (xdp_queue_number >= efx->xdp_tx_queue_count) - return -EINVAL; - - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is XDP %u, HW %u\n", - tx_queue->channel->channel, tx_queue->label, - xdp_queue_number, tx_queue->queue); - efx->xdp_tx_queues[xdp_queue_number] = tx_queue; - return 0; -} - int efx_set_channels(struct efx_nic *efx) { - struct efx_tx_queue *tx_queue; struct efx_channel *channel; - unsigned int next_queue = 0; - int xdp_queue_number; int rc; efx->tx_channel_offset = @@ -929,61 +992,14 @@ int efx_set_channels(struct efx_nic *efx) return -ENOMEM; } - /* We need to mark which channels really have RX and TX - * queues, and adjust the TX queue numbers if we have separate - * RX-only and TX-only channels. - */ - xdp_queue_number = 0; efx_for_each_channel(channel, efx) { if (channel->channel < efx->n_rx_channels) channel->rx_queue.core_index = channel->channel; else channel->rx_queue.core_index = -1; - - if (channel->channel >= efx->tx_channel_offset) { - if (efx_channel_is_xdp_tx(channel)) { - efx_for_each_channel_tx_queue(tx_queue, channel) { - tx_queue->queue = next_queue++; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } - } else { - efx_for_each_channel_tx_queue(tx_queue, channel) { - tx_queue->queue = next_queue++; - netif_dbg(efx, drv, efx->net_dev, "Channel %u TXQ %u is HW %u\n", - channel->channel, tx_queue->label, - tx_queue->queue); - } - - /* If XDP is borrowing queues from net stack, it must use the queue - * with no csum offload, which is the first one of the channel - * (note: channel->tx_queue_by_type is not initialized yet) - */ - if (efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_BORROWED) { - tx_queue = &channel->tx_queue[0]; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } - } - } } - WARN_ON(efx->xdp_txq_queues_mode == EFX_XDP_TX_QUEUES_DEDICATED && - xdp_queue_number != efx->xdp_tx_queue_count); - WARN_ON(efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED && - xdp_queue_number > efx->xdp_tx_queue_count); - /* If we have more CPUs than assigned XDP TX queues, assign the already - * existing queues to the exceeding CPUs - */ - next_queue = 0; - while (xdp_queue_number < efx->xdp_tx_queue_count) { - tx_queue = efx->xdp_tx_queues[next_queue++]; - rc = efx_set_xdp_tx_queue(efx, xdp_queue_number, tx_queue); - if (rc == 0) - xdp_queue_number++; - } + efx_set_xdp_channels(efx); rc = netif_set_real_num_tx_queues(efx->net_dev, efx->n_tx_channels); if (rc)
After rx/tx ring buffer size is changed, kernel panic occurs when it acts XDP_TX or XDP_REDIRECT. When tx/rx ring buffer size is changed(ethtool -G), sfc driver reallocates and reinitializes rx and tx queues and their buffer (tx_queue->buffer). But it misses reinitializing xdp queues(efx->xdp_tx_queues). So, while it is acting XDP_TX or XDP_REDIRECT, it uses the uninitialized tx_queue->buffer. A new function efx_set_xdp_channels() is separated from efx_set_channels() to handle only xdp queues. Splat looks like: BUG: kernel NULL pointer dereference, address: 000000000000002a #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#4] PREEMPT SMP NOPTI RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D 5.17.0+ #55 e8beeee8289528f11357029357cf Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 RSP: 0018:ffff92f121e45c60 EFLAGS: 00010297 RIP: 0010:efx_tx_map_chunk+0x54/0x90 [sfc] RAX: 0000000000000040 RBX: ffff92ea506895c0 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001139b10ce RDI: ffff92ea506895c0 RBP: ffffffffc0358a80 R08: 00000001139b110d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001139b10ce R15: ffff92ea506895c0 FS: 0000000000000000(0000) GS:ffff92f121ec0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Code: 48 8b 8d a8 01 00 00 48 8d 14 52 4c 8d 2c d0 44 89 e0 48 85 c9 74 0e 44 89 e2 4c 89 f6 48 80 CR2: 000000000000002a CR3: 00000003e6810004 CR4: 00000000007706e0 RSP: 0018:ffff92f121e85c60 EFLAGS: 00010297 PKRU: 55555554 RAX: 0000000000000040 RBX: ffff92ea50689700 RCX: ffffffffc0330870 RDX: 0000000000000001 RSI: 00000001145a90ce RDI: ffff92ea50689700 RBP: ffffffffc0358a80 R08: 00000001145a910d R09: 0000000000000000 R10: 0000000000000001 R11: ffff92ea414c0088 R12: 0000000000000040 R13: 0000000000000018 R14: 00000001145a90ce R15: ffff92ea50689700 FS: 0000000000000000(0000) GS:ffff92f121e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000000002a CR3: 00000003e6810005 CR4: 00000000007706e0 PKRU: 55555554 Call Trace: <IRQ> efx_xdp_tx_buffers+0x12b/0x3d0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] __efx_rx_packet+0x5c3/0x930 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_rx_packet+0x28c/0x2e0 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] efx_ef10_ev_process+0x5f8/0xf40 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] ? enqueue_task_fair+0x95/0x550 efx_poll+0xc4/0x360 [sfc 84c94b8e32d44d296c17e10a634d3ad454de4ba5] Fixes: 3990a8fffbda ("sfc: allocate channels for XDP tx queues") Signed-off-by: Taehee Yoo <ap420073@gmail.com> --- v2: Do not use inline in .c file drivers/net/ethernet/sfc/efx_channels.c | 146 +++++++++++++----------- 1 file changed, 81 insertions(+), 65 deletions(-)