diff mbox series

[2/2] nvme: rdma/tcp: call nvme_delete_dead_ctrl for handling reconnect failure

Message ID 20230530094322.258090-3-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series nvme: add nvme_delete_dead_ctrl for avoiding io deadlock | expand

Commit Message

Ming Lei May 30, 2023, 9:43 a.m. UTC
Reconnect failure has been reached after trying enough times, and controller
is actually incapable of handling IO, so it should be marked as dead, so call
nvme_delete_dead_ctrl() to handle the failure for avoiding the following IO
deadlock:

1) writeback IO waits in __bio_queue_enter() because queue is frozen
during error recovery

2) reconnect failure handler removes controller, and del_gendisk() waits
for above writeback IO in fsync/invalidate bdev

Fix the issue by calling nvme_delete_dead_ctrl() which call
nvme_mark_namespaces_dead() before deleting disk, so the above writeback
IO will be failed, and IO deadlock is avoided.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/nvme/host/rdma.c | 2 +-
 drivers/nvme/host/tcp.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Yi Zhang June 5, 2023, 2:31 p.m. UTC | #1
Thanks Ming, feel free to add:
Tested-by: Yi Zhang <yi.zhang@redhat.com>

On Tue, May 30, 2023 at 5:44 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> Reconnect failure has been reached after trying enough times, and controller
> is actually incapable of handling IO, so it should be marked as dead, so call
> nvme_delete_dead_ctrl() to handle the failure for avoiding the following IO
> deadlock:
>
> 1) writeback IO waits in __bio_queue_enter() because queue is frozen
> during error recovery
>
> 2) reconnect failure handler removes controller, and del_gendisk() waits
> for above writeback IO in fsync/invalidate bdev
>
> Fix the issue by calling nvme_delete_dead_ctrl() which call
> nvme_mark_namespaces_dead() before deleting disk, so the above writeback
> IO will be failed, and IO deadlock is avoided.
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/nvme/host/rdma.c | 2 +-
>  drivers/nvme/host/tcp.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index 0eb79696fb73..cdf5855c3009 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1028,7 +1028,7 @@ static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
>                 queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
>                                 ctrl->ctrl.opts->reconnect_delay * HZ);
>         } else {
> -               nvme_delete_ctrl(&ctrl->ctrl);
> +               nvme_delete_dead_ctrl(&ctrl->ctrl);
>         }
>  }
>
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index bf0230442d57..2c119bff7010 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2047,7 +2047,7 @@ static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
>                                 ctrl->opts->reconnect_delay * HZ);
>         } else {
>                 dev_info(ctrl->device, "Removing controller...\n");
> -               nvme_delete_ctrl(ctrl);
> +               nvme_delete_dead_ctrl(ctrl);
>         }
>  }
>
> --
> 2.40.1
>
diff mbox series

Patch

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 0eb79696fb73..cdf5855c3009 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1028,7 +1028,7 @@  static void nvme_rdma_reconnect_or_remove(struct nvme_rdma_ctrl *ctrl)
 		queue_delayed_work(nvme_wq, &ctrl->reconnect_work,
 				ctrl->ctrl.opts->reconnect_delay * HZ);
 	} else {
-		nvme_delete_ctrl(&ctrl->ctrl);
+		nvme_delete_dead_ctrl(&ctrl->ctrl);
 	}
 }
 
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index bf0230442d57..2c119bff7010 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2047,7 +2047,7 @@  static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl)
 				ctrl->opts->reconnect_delay * HZ);
 	} else {
 		dev_info(ctrl->device, "Removing controller...\n");
-		nvme_delete_ctrl(ctrl);
+		nvme_delete_dead_ctrl(ctrl);
 	}
 }