Message ID | 20230906031148.16774-421-nic_swsd@realtek.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] r8152: avoid the driver drops a lot of packets | expand |
On Wed, 6 Sep 2023 11:11:48 +0800 Hayes Wang wrote: > Stop submitting rx, if the driver queue more than 256 packets. > > If the hardware is more fast than the software, the driver would start > queuing the packets. And, the driver starts dropping the packets, if it > queues more than 1000 packets. > > Increase the weight of NAPI could improve the situation. However, the > weight has been changed to 64, so we have to stop submitting rx when the > driver queues too many packets. Then, the device may send the pause frame > to slow down the receiving, when the FIFO of the device is full. Good to see that you can repro the problem. Before we tweak the heuristics let's make sure rx_bottom() behaves correctly. Could you make sure that - we don't perform _any_ rx processing when budget is 0 (see the NAPI documentation under Documentation/networking) - finish the current aggregate even if budget run out, return work_done = budget in that case. With this change the rx_queue thing should be gone completely. - instead of copying the head use napi_get_frags() + napi_gro_frags() it gives you an skb, you just attach the page to it as a frag and hand it back to GRO. This makes sure you never pull data into head rather than just headers. Please share the performance results with those changes.
Jakub Kicinski <kuba@kernel.org> > Sent: Thursday, September 7, 2023 8:29 AM [...] > Good to see that you can repro the problem. I don't reproduce the problem. I just find some information about it. > Before we tweak the heuristics let's make sure rx_bottom() behaves > correctly. Could you make sure that > - we don't perform _any_ rx processing when budget is 0 > (see the NAPI documentation under Documentation/networking) The work_done would be 0, and napi_complete_done() wouldn't be called. However, skb_queue_len(&tp->rx_queue) may be increased. I think it is not acceptable, right? > - finish the current aggregate even if budget run out, return > work_done = budget in that case. > With this change the rx_queue thing should be gone completely. Excuse me. I don't understand this part. I know that when the packets are more than budget, the maximum packets which could be handled is budget. That is, return work_done = budget. However, the extra packets would be queued to rx_queue. I don't understand what you mean about " the rx_queue thing should be gone completely". I think the current driver would return work_done = budget, and queue the other packets. I don't sure what you want me to change. > - instead of copying the head use napi_get_frags() + napi_gro_frags() > it gives you an skb, you just attach the page to it as a frag and > hand it back to GRO. This makes sure you never pull data into head > rather than just headers. I would study about them. Thanks. Should I include above changes for this patch? I think I have to submit another patches for above. > Please share the performance results with those changes. I couldn't reproduce the problem, so I couldn't provide the result with the differences. Best Regards, Hayes
On Thu, 7 Sep 2023 07:16:50 +0000 Hayes Wang wrote: > > Before we tweak the heuristics let's make sure rx_bottom() behaves > > correctly. Could you make sure that > > - we don't perform _any_ rx processing when budget is 0 > > (see the NAPI documentation under Documentation/networking) > > The work_done would be 0, and napi_complete_done() wouldn't be called. > However, skb_queue_len(&tp->rx_queue) may be increased. I think it is > not acceptable, right? If budget is 0 we got called by netconsole, meaning we may be holding arbitrary locks. And we can't use napi_alloc_skb() which is for softirq/bh context only. We should only try to complete Tx in that case, since r8152_poll() doesn't handle any Tx the right thing seems to be to add if (!budget) return 0; > > - finish the current aggregate even if budget run out, return > > work_done = budget in that case. > > With this change the rx_queue thing should be gone completely. > > Excuse me. I don't understand this part. I know that when the packets are > more than budget, the maximum packets which could be handled is budget. > That is, return work_done = budget. However, the extra packets would be queued > to rx_queue. I don't understand what you mean about " the rx_queue thing > should be gone completely". I think the current driver would return > work_done = budget, and queue the other packets. I don't sure what you > want me to change. Nothing will explode if we process a few more packets than budget (assuming budget > 0). If we already do allocations and prepare those skbs - there's no point holding onto them in the driver. Just sent them up the stack (and then we won't need the local rx_queue). > > - instead of copying the head use napi_get_frags() + napi_gro_frags() > > it gives you an skb, you just attach the page to it as a frag and > > hand it back to GRO. This makes sure you never pull data into head > > rather than just headers. > > I would study about them. Thanks. > > Should I include above changes for this patch? > I think I have to submit another patches for above. > > > Please share the performance results with those changes. > > I couldn't reproduce the problem, so I couldn't provide the result > with the differences. Hm, if you can't repro my intuition would be to only take the patch for budget=0 handling into net, and the rest as improvements into net-next.
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c index 332c853ca99b..4a62e420a7be 100644 --- a/drivers/net/usb/r8152.c +++ b/drivers/net/usb/r8152.c @@ -2484,10 +2484,6 @@ static int rx_bottom(struct r8152 *tp, int budget) unsigned int pkt_len, rx_frag_head_sz; struct sk_buff *skb; - /* limit the skb numbers for rx_queue */ - if (unlikely(skb_queue_len(&tp->rx_queue) >= 1000)) - break; - pkt_len = le32_to_cpu(rx_desc->opts1) & RX_LEN_MASK; if (pkt_len < ETH_ZLEN) break; @@ -2556,9 +2552,14 @@ static int rx_bottom(struct r8152 *tp, int budget) } submit: - if (!ret) { + if (!ret && likely(skb_queue_len(&tp->rx_queue) < 256)) { ret = r8152_submit_rx(tp, agg, GFP_ATOMIC); } else { + WARN_ON_ONCE(skb_queue_len(&tp->rx_queue) >= 1000); + if (net_ratelimit()) + netif_dbg(tp, rx_err, tp->netdev, + "submit_rx=%d, rx_queue=%u\n", + ret, skb_queue_len(&tp->rx_queue)); urb->actual_length = 0; list_add_tail(&agg->list, next); }
Stop submitting rx, if the driver queue more than 256 packets. If the hardware is more fast than the software, the driver would start queuing the packets. And, the driver starts dropping the packets, if it queues more than 1000 packets. Increase the weight of NAPI could improve the situation. However, the weight has been changed to 64, so we have to stop submitting rx when the driver queues too many packets. Then, the device may send the pause frame to slow down the receiving, when the FIFO of the device is full. Fixes: cf74eb5a5bc8 ("eth: r8152: try to use a normal budget") Signed-off-by: Hayes Wang <hayeswang@realtek.com> --- v2: Add WARN_ON_ONCE() and debug message for the skb_queue_len(&tp->rx_queue). drivers/net/usb/r8152.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)