mbox series

[PATCHv2,0/2] ipoib bugfix

Message ID 20231121130316.126364-1-jinpu.wang@ionos.com (mailing list archive)
Headers show
Series ipoib bugfix | expand

Message

Jinpu Wang Nov. 21, 2023, 1:03 p.m. UTC
We run into queue timeout often with call trace as such:
NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
Call Trace:
call_timer_fn+0x27/0x100
__run_timers.part.0+0x1be/0x230
? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
run_timer_softirq+0x26/0x50
__do_softirq+0xbc/0x26d
asm_call_irq_on_stack+0xf/0x20
ib0.beef: transmit timeout: latency 10 msecs
ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0

The last two message repeated for days.

After cross check with Mellanox OFED, I noticed some bugfix are missing in
upstream, hence I take the liberty to send them out.

Thx!

v2:
Fix the build error due to napi api change in v6.7

Jack Wang (2):
  ipoib: Fix error code return in ipoib_mcast_join
  ipoib: Add tx timeout work to recover queue stop situation

 drivers/infiniband/ulp/ipoib/ipoib.h          |  4 +++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c       | 26 ++++++++++++++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c     | 33 +++++++++++++++++--
 .../infiniband/ulp/ipoib/ipoib_multicast.c    |  1 +
 4 files changed, 61 insertions(+), 3 deletions(-)

Comments

Leon Romanovsky Nov. 26, 2023, 9:34 a.m. UTC | #1
On Tue, 21 Nov 2023 14:03:14 +0100, Jack Wang wrote:
> We run into queue timeout often with call trace as such:
> NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
> Call Trace:
> call_timer_fn+0x27/0x100
> __run_timers.part.0+0x1be/0x230
> ? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
> run_timer_softirq+0x26/0x50
> __do_softirq+0xbc/0x26d
> asm_call_irq_on_stack+0xf/0x20
> ib0.beef: transmit timeout: latency 10 msecs
> ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0
> 
> [...]

Applied, thanks!

[1/2] ipoib: Fix error code return in ipoib_mcast_join
      https://git.kernel.org/rdma/rdma/c/753fff78f43070
[2/2] ipoib: Add tx timeout work to recover queue stop situation
      https://git.kernel.org/rdma/rdma/c/50af5d12f7e24b

Best regards,