[0/2] bugfix for ipoib

Message ID	20231120203501.321587-1-jinpu.wang@ionos.com (mailing list archive)
Headers	show Return-Path: <linux-rdma-owner@vger.kernel.org> From: Jack Wang <jinpu.wang@ionos.com> To: linux-rdma@vger.kernel.org Cc: leon@kernel.org, jgg@ziepe.ca Subject: [PATCH 0/2] bugfix for ipoib Date: Mon, 20 Nov 2023 21:34:59 +0100 Message-Id: <20231120203501.321587-1-jinpu.wang@ionos.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bugfix for ipoib \| expand [0/2] bugfix for ipoib [1/2] ipoib: Fix error code return in ipoib_mcast_join [2/2] ipoib: Add tx timeout work to recover queue stop situation

Message ID

20231120203501.321587-1-jinpu.wang@ionos.com (mailing list archive)

Headers

From: Jack Wang <jinpu.wang@ionos.com>
To: linux-rdma@vger.kernel.org
Cc: leon@kernel.org, jgg@ziepe.ca
Subject: [PATCH 0/2] bugfix for ipoib
Date: Mon, 20 Nov 2023 21:34:59 +0100
Message-Id: <20231120203501.321587-1-jinpu.wang@ionos.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

bugfix for ipoib | expand

Message

Jinpu Wang Nov. 20, 2023, 8:34 p.m. UTC

We run into queue timeout often with call trace as such:
NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
Call Trace:
call_timer_fn+0x27/0x100
__run_timers.part.0+0x1be/0x230
? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
run_timer_softirq+0x26/0x50
__do_softirq+0xbc/0x26d
asm_call_irq_on_stack+0xf/0x20
ib0.beef: transmit timeout: latency 10 msecs
ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0

The last two message repeated for days.

After cross check with Mellanox OFED, I noticed some bugfix are missing in
upstream, hence I take the liberty to send them out.

Thx!

Jack Wang (2):
  ipoib: Fix error code return in ipoib_mcast_join
  ipoib: Add tx timeout work to recover queue stop situation

 drivers/infiniband/ulp/ipoib/ipoib.h          |  4 +++
 drivers/infiniband/ulp/ipoib/ipoib_ib.c       | 26 ++++++++++++++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c     | 33 +++++++++++++++++--
 .../infiniband/ulp/ipoib/ipoib_multicast.c    |  1 +
 4 files changed, 61 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe Nov. 21, 2023, 12:16 a.m. UTC | #1

On Mon, Nov 20, 2023 at 09:34:59PM +0100, Jack Wang wrote:
> We run into queue timeout often with call trace as such:
> NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
> Call Trace:
> call_timer_fn+0x27/0x100
> __run_timers.part.0+0x1be/0x230
> ? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
> run_timer_softirq+0x26/0x50
> __do_softirq+0xbc/0x26d
> asm_call_irq_on_stack+0xf/0x20
> ib0.beef: transmit timeout: latency 10 msecs
> ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0
> 
> The last two message repeated for days.

You shouldn't get tx timeouts and fully stuck queues like that, it
suggests something else is very wrong in that system.

> After cross check with Mellanox OFED, I noticed some bugfix are missing in
> upstream, hence I take the liberty to send them out.

Recovery is recovery, it is just RAS

Jason

Jinpu Wang Nov. 21, 2023, 1:02 p.m. UTC | #2

Hi Jason.

On Tue, Nov 21, 2023 at 1:16 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Nov 20, 2023 at 09:34:59PM +0100, Jack Wang wrote:
> > We run into queue timeout often with call trace as such:
> > NETDEV WATCHDOG: ib0.beef (): transmit queue 26 timed out
> > Call Trace:
> > call_timer_fn+0x27/0x100
> > __run_timers.part.0+0x1be/0x230
> > ? mlx5_cq_tasklet_cb+0x6d/0x140 [mlx5_core]
> > run_timer_softirq+0x26/0x50
> > __do_softirq+0xbc/0x26d
> > asm_call_irq_on_stack+0xf/0x20
> > ib0.beef: transmit timeout: latency 10 msecs
> > ib0.beef: queue stopped 0, tx_head 0, tx_tail 0, global_tx_head 0, global_tx_tail 0
> >
> > The last two message repeated for days.
>
> You shouldn't get tx timeouts and fully stuck queues like that, it
> suggests something else is very wrong in that system.
We hit such warnings from time to time over years in different
locations, but can't reproduce at will in staging environment.

There are problems around.
>
> > After cross check with Mellanox OFED, I noticed some bugfix are missing in
> > upstream, hence I take the liberty to send them out.
>
> Recovery is recovery, it is just RAS

I managed to trigger the situation by an extra debug interface

 static DEVICE_ATTR_RW(umcast);

+static ssize_t timeout_store(struct device *dev, struct device_attribute *attr,
+                            const char *buf, size_t count)
+{
+       unsigned long val = simple_strtoul(buf, NULL, 0);
+
+       netif_stop_queue(to_net_dev(dev));
+       ipoib_timeout(to_net_dev(dev), val);
+
+       return count;
+}
+
 int ipoib_add_umcast_attr(struct net_device *dev)
 {
        return device_create_file(&dev->dev, &dev_attr_umcast);
 }

+static DEVICE_ATTR_WO(timeout);
+
+int ipoib_add_timeout_attr(struct net_device *dev)
+{
+       return device_create_file(&dev->dev, &dev_attr_timeout);
+}
+
 static void set_base_guid(struct ipoib_dev_priv *priv, union ib_gid *gid)
 {
        struct ipoib_dev_priv *child_priv;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 0322dc75396f..9b5dd628da2e 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -148,6 +148,8 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv,
struct ipoib_dev_priv *priv,
                        goto sysfs_failed;
                if (ipoib_add_umcast_attr(ndev))
                        goto sysfs_failed;
+               if (ipoib_add_timeout_attr(ndev))
+                       goto sysfs_failed;

                if (device_create_file(&ndev->dev, &dev_attr_parent))
                        goto sysfs_failed;



running iperf3 on child interface, and trigger the timeout via sysfs,
I'm able to trigger the WATCHDOG and timeout without the recover
patch, but can't trigger it with the fix.

I will send v2 version for the napi api change reported by bot.
>
> Jason