diff mbox

[3/5] nvme-rdma: Make sure to shutdown the controller if we can

Message ID 1469822242-3477-4-git-send-email-sagi@grimberg.me (mailing list archive)
State Not Applicable
Headers show

Commit Message

Sagi Grimberg July 29, 2016, 7:57 p.m. UTC
Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because
it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it).

Instead, check that the admin queue is connected. Note that it is safe
because we can never see a copmeting thread trying to destroy the admin
queue (reset or delete controller).

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
---
 drivers/nvme/host/rdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig Aug. 1, 2016, 11:04 a.m. UTC | #1
On Fri, Jul 29, 2016 at 10:57:20PM +0300, Sagi Grimberg wrote:
> Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because
> it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it).
> 
> Instead, check that the admin queue is connected. Note that it is safe
> because we can never see a copmeting thread trying to destroy the admin
> queue (reset or delete controller).
> 
> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
> ---
>  drivers/nvme/host/rdma.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index a70eb3cbf656..641ab7f91899 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1644,7 +1644,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl)
>  		nvme_rdma_free_io_queues(ctrl);
>  	}
>  
> -	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
> +	if (test_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[0].flags))
>  		nvme_shutdown_ctrl(&ctrl->ctrl);
>  
>  	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);

Maybe the right way to handle this is to unconditionally call
nvme_shutdown_ctrl and make sure we return an early error
on the register write?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Aug. 1, 2016, 11:17 a.m. UTC | #2
>> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
>> index a70eb3cbf656..641ab7f91899 100644
>> --- a/drivers/nvme/host/rdma.c
>> +++ b/drivers/nvme/host/rdma.c
>> @@ -1644,7 +1644,7 @@ static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl)
>>  		nvme_rdma_free_io_queues(ctrl);
>>  	}
>>
>> -	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
>> +	if (test_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[0].flags))
>>  		nvme_shutdown_ctrl(&ctrl->ctrl);
>>
>>  	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
>
> Maybe the right way to handle this is to unconditionally call
> nvme_shutdown_ctrl and make sure we return an early error
> on the register write?

As I wrote on patch 2/5 reply, I'd like to avoid depending on
queue_rq to fail early peeking at the ctrl state. This dependency
can grow in the future and I think we should at least try not to
go there...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Aug. 1, 2016, 3:48 p.m. UTC | #3
On Mon, Aug 01, 2016 at 02:17:37PM +0300, Sagi Grimberg wrote:
>>> -	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
>>> +	if (test_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[0].flags))
>>>  		nvme_shutdown_ctrl(&ctrl->ctrl);
>>>
>>>  	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
>>
>> Maybe the right way to handle this is to unconditionally call
>> nvme_shutdown_ctrl and make sure we return an early error
>> on the register write?
>
> As I wrote on patch 2/5 reply, I'd like to avoid depending on
> queue_rq to fail early peeking at the ctrl state. This dependency
> can grow in the future and I think we should at least try not to
> go there...

I don't want ->queue_rq to peek at controller state.  What I had
in mind was copying the PCIe behavior of failing early in the
timeout handler if we are in the reset handler, which will
mean blk_execute_rq will return ASAP with an error.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Aug. 2, 2016, 6:19 a.m. UTC | #4
>>>> -	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
>>>> +	if (test_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[0].flags))
>>>>  		nvme_shutdown_ctrl(&ctrl->ctrl);
>>>>
>>>>  	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);
>>>
>>> Maybe the right way to handle this is to unconditionally call
>>> nvme_shutdown_ctrl and make sure we return an early error
>>> on the register write?
>>
>> As I wrote on patch 2/5 reply, I'd like to avoid depending on
>> queue_rq to fail early peeking at the ctrl state. This dependency
>> can grow in the future and I think we should at least try not to
>> go there...
>
> I don't want ->queue_rq to peek at controller state.  What I had
> in mind was copying the PCIe behavior of failing early in the
> timeout handler if we are in the reset handler, which will
> mean blk_execute_rq will return ASAP with an error.

Do we want the unnecessary ADMIN_TIMEOUT for sending a shutdown
on a queue we know is not connected? I'm worried it will have scale
issues...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig Aug. 2, 2016, 12:52 p.m. UTC | #5
On Tue, Aug 02, 2016 at 09:19:56AM +0300, Sagi Grimberg wrote:
>> I don't want ->queue_rq to peek at controller state.  What I had
>> in mind was copying the PCIe behavior of failing early in the
>> timeout handler if we are in the reset handler, which will
>> mean blk_execute_rq will return ASAP with an error.
>
> Do we want the unnecessary ADMIN_TIMEOUT for sending a shutdown
> on a queue we know is not connected? I'm worried it will have scale
> issues...

Allright, let's go with your version.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Aug. 2, 2016, 1:26 p.m. UTC | #6
>>> I don't want ->queue_rq to peek at controller state.  What I had
>>> in mind was copying the PCIe behavior of failing early in the
>>> timeout handler if we are in the reset handler, which will
>>> mean blk_execute_rq will return ASAP with an error.
>>
>> Do we want the unnecessary ADMIN_TIMEOUT for sending a shutdown
>> on a queue we know is not connected? I'm worried it will have scale
>> issues...
>
> Allright, let's go with your version.

Is this an implicit reviewed-by?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index a70eb3cbf656..641ab7f91899 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1644,7 +1644,7 @@  static void nvme_rdma_shutdown_ctrl(struct nvme_rdma_ctrl *ctrl)
 		nvme_rdma_free_io_queues(ctrl);
 	}
 
-	if (ctrl->ctrl.state == NVME_CTRL_LIVE)
+	if (test_bit(NVME_RDMA_Q_CONNECTED, &ctrl->queues[0].flags))
 		nvme_shutdown_ctrl(&ctrl->ctrl);
 
 	blk_mq_stop_hw_queues(ctrl->ctrl.admin_q);