diff mbox

[v1,1/3] IB/srp: Fix crash when unmapping data loop

Message ID 530B6444.1000805@acm.org (mailing list archive)
State Rejected
Headers show

Commit Message

Bart Van Assche Feb. 24, 2014, 3:24 p.m. UTC
On 02/24/14 15:30, Sagi Grimberg wrote:
> When unmapping request data, it is unsafe automatically
> decrement req->nfmr regardless of it's value. This may
> happen since IO and reconnect flow may run concurrently
> resulting in req->nfmr = -1 and falsely call ib_fmr_pool_unmap.
> 
> Fix the loop condition to be greater than zero (which
> explicitly means that FMRs were used on this request)
> and only increment when needed.
> 
> This crash is easily reproduceable with ConnectX VFs OR
> Connect-IB (where FMRs are not supported)
> 
> Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c |    5 ++++-
>  1 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 529b6bc..0e20bfb 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -766,8 +766,11 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
>  		return;
>  
>  	pfmr = req->fmr_list;
> -	while (req->nfmr--)
> +
> +	while (req->nfmr > 0) {
>  		ib_fmr_pool_unmap(*pfmr++);
> +		req->nfmr--;
> +	}
>  
>  	ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
>  			scmnd->sc_data_direction);
> 

Hello Sagi,

If srp_unmap_data() got invoked twice for the same request that means
that a race condition has been hit. I would like to see the root cause
of that race condition fixed instead of making it safe to invoke
srp_unmap_data() multiple times. It would be appreciated if you could
verify whether the following patch is a valid alternative for the above
patch:


Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sagi Grimberg Feb. 27, 2014, 11:32 a.m. UTC | #1
On 2/24/2014 5:24 PM, Bart Van Assche wrote:
> On 02/24/14 15:30, Sagi Grimberg wrote:
>> When unmapping request data, it is unsafe automatically
>> decrement req->nfmr regardless of it's value. This may
>> happen since IO and reconnect flow may run concurrently
>> resulting in req->nfmr = -1 and falsely call ib_fmr_pool_unmap.
>>
>> Fix the loop condition to be greater than zero (which
>> explicitly means that FMRs were used on this request)
>> and only increment when needed.
>>
>> This crash is easily reproduceable with ConnectX VFs OR
>> Connect-IB (where FMRs are not supported)
>>
>> Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
>> ---
>>   drivers/infiniband/ulp/srp/ib_srp.c |    5 ++++-
>>   1 files changed, 4 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 529b6bc..0e20bfb 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -766,8 +766,11 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
>>   		return;
>>   
>>   	pfmr = req->fmr_list;
>> -	while (req->nfmr--)
>> +
>> +	while (req->nfmr > 0) {
>>   		ib_fmr_pool_unmap(*pfmr++);
>> +		req->nfmr--;
>> +	}
>>   
>>   	ib_dma_unmap_sg(ibdev, scsi_sglist(scmnd), scsi_sg_count(scmnd),
>>   			scmnd->sc_data_direction);
>>
> Hello Sagi,
>
> If srp_unmap_data() got invoked twice for the same request that means
> that a race condition has been hit. I would like to see the root cause
> of that race condition fixed instead of making it safe to invoke
> srp_unmap_data() multiple times. It would be appreciated if you could
> verify whether the following patch is a valid alternative for the above
> patch:

Hey Bart, Sorry for the late response,

It's been a while since I did this, but as I recall the problem wasn't 
in srp_post_send failure path.
Moreover I don't think a spinlock on srp_post_send is a good idea unless 
it is absolutely needed
(I don't think so).

As I recall  (need to re-confirm this), the problem was that in unstable 
ports condition srp_abort is
called invoking srp_free_req (unmap call #1) and reconnect process (or 
reset_device or terminate_io)
finishes all requests in the request ring (unmap call #2). when FMRs are 
used then nfmr goes to zero
the first time, but when FMRs are not supported nfmr goes from 0 to -1 
causing the crash since nfmr condition
is not safe.

> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index b753a7a..2c75f95 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -2141,8 +2141,7 @@ static int SRP_QUEUECOMMAND(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   	cmd->tag    = req->index;
>   	memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);
>   
> -	req->scmnd    = scmnd;
> -	req->cmd      = iu;
> +	req->cmd = iu;
>   
>   	len = srp_map_data(scmnd, ch, req);
>   	if (len < 0) {
> @@ -2160,7 +2159,12 @@ static int SRP_QUEUECOMMAND(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   	ib_dma_sync_single_for_device(dev, iu->dma, target->max_iu_len,
>   				      DMA_TO_DEVICE);
>   
> -	if (srp_post_send(ch, iu, len)) {
> +	spin_lock_irqsave(&ch->lock, flags);
> +	req->scmnd = scmnd;
> +	ret = srp_post_send(ch, iu, len) ? SCSI_MLQUEUE_HOST_BUSY : 0;
> +	spin_unlock_irqrestore(&ch->lock, flags);
> +
> +	if (ret) {
>   		shost_printk(KERN_ERR, target->scsi_host, PFX "Send failed\n");
>   		goto err_unmap;
>   	}
>
> Thanks,
>
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index b753a7a..2c75f95 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2141,8 +2141,7 @@  static int SRP_QUEUECOMMAND(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	cmd->tag    = req->index;
 	memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);
 
-	req->scmnd    = scmnd;
-	req->cmd      = iu;
+	req->cmd = iu;
 
 	len = srp_map_data(scmnd, ch, req);
 	if (len < 0) {
@@ -2160,7 +2159,12 @@  static int SRP_QUEUECOMMAND(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	ib_dma_sync_single_for_device(dev, iu->dma, target->max_iu_len,
 				      DMA_TO_DEVICE);
 
-	if (srp_post_send(ch, iu, len)) {
+	spin_lock_irqsave(&ch->lock, flags);
+	req->scmnd = scmnd;
+	ret = srp_post_send(ch, iu, len) ? SCSI_MLQUEUE_HOST_BUSY : 0;
+	spin_unlock_irqrestore(&ch->lock, flags);
+
+	if (ret) {
 		shost_printk(KERN_ERR, target->scsi_host, PFX "Send failed\n");
 		goto err_unmap;
 	}