Message ID | 20231120203501.321587-1-jinpu.wang@ionos.com (mailing list archive) |
---|---|
Headers | show |
Series | bugfix for ipoib | expand |
On Mon, Nov 20, 2023 at 09:34:59PM +0100, Jack Wang wrote: > We run into queue timeout often with call trace as such: > NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out > Call Trace: > call_timer_fn+0x27/0x100 > __run_timers.part.0+0x1be/0x230 > ? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core] > run_timer_softirq+0x26/0x50 > __do_softirq+0xbc/0x26d > asm_call_irq_on_stack+0xf/0x20 > ib0.beef: transmit timeout: latency 10 msecs > ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0 > > The last two message repeated for days. You shouldn't get tx timeouts and fully stuck queues like that, it suggests something else is very wrong in that system. > After cross check with Mellanox OFED, I noticed some bugfix are missing in > upstream, hence I take the liberty to send them out. Recovery is recovery, it is just RAS Jason
Hi Jason. On Tue, Nov 21, 2023 at 1:16 AM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Mon, Nov 20, 2023 at 09:34:59PM +0100, Jack Wang wrote: > > We run into queue timeout often with call trace as such: > > NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out > > Call Trace: > > call_timer_fn+0x27/0x100 > > __run_timers.part.0+0x1be/0x230 > > ? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core] > > run_timer_softirq+0x26/0x50 > > __do_softirq+0xbc/0x26d > > asm_call_irq_on_stack+0xf/0x20 > > ib0.beef: transmit timeout: latency 10 msecs > > ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0 > > > > The last two message repeated for days. > > You shouldn't get tx timeouts and fully stuck queues like that, it > suggests something else is very wrong in that system. We hit such warnings from time to time over years in different locations, but can't reproduce at will in staging environment. There are problems around. > > > After cross check with Mellanox OFED, I noticed some bugfix are missing in > > upstream, hence I take the liberty to send them out. > > Recovery is recovery, it is just RAS I managed to trigger the situation by an extra debug interface static DEVICE_ATTR_RW(umcast); +static ssize_t timeout_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + unsigned long val = simple_strtoul(buf, NULL, 0); + + netif_stop_queue(to_net_dev(dev)); + ipoib_timeout(to_net_dev(dev), val); + + return count; +} + int ipoib_add_umcast_attr(struct net_device *dev) { return device_create_file(&dev->dev, &dev_attr_umcast); } +static DEVICE_ATTR_WO(timeout); + +int ipoib_add_timeout_attr(struct net_device *dev) +{ + return device_create_file(&dev->dev, &dev_attr_timeout); +} + static void set_base_guid(struct ipoib_dev_priv *priv, union ib_gid *gid) { struct ipoib_dev_priv *child_priv; diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c index 0322dc75396f..9b5dd628da2e 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c @@ -148,6 +148,8 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct ipoib_dev_priv *priv, goto sysfs_failed; if (ipoib_add_umcast_attr(ndev)) goto sysfs_failed; + if (ipoib_add_timeout_attr(ndev)) + goto sysfs_failed; if (device_create_file(&ndev->dev, &dev_attr_parent)) goto sysfs_failed; running iperf3 on child interface, and trigger the timeout via sysfs, I'm able to trigger the WATCHDOG and timeout without the recover patch, but can't trigger it with the fix. I will send v2 version for the napi api change reported by bot. > > Jason