mbox series

[0/2] fix double completion of timed out commands

Message ID 20191021195628.19849-1-josef@toxicpanda.com (mailing list archive)
Headers show
Series fix double completion of timed out commands | expand

Message

Josef Bacik Oct. 21, 2019, 7:56 p.m. UTC
We noticed a problem where NBD sometimes double completes the same request when
things go wrong and we time out the request.  If the other side goes out to
lunch but happens to reply just as we're timing out the requests we can end up
with a double completion on the request.

We already keep track of the command status, we just need to make sure we
protect all cases where we set cmd->status with the cmd->lock, which is patch
#1.  Patch #2 is the fix for the problem, which catches the case where we race
with the timeout handler and the reply handler.  Thanks,

Josef

Comments

Mike Christie Oct. 21, 2019, 9:43 p.m. UTC | #1
On 10/21/2019 02:56 PM, Josef Bacik wrote:
> We noticed a problem where NBD sometimes double completes the same request when
> things go wrong and we time out the request.  If the other side goes out to
> lunch but happens to reply just as we're timing out the requests we can end up
> with a double completion on the request.
> 
> We already keep track of the command status, we just need to make sure we
> protect all cases where we set cmd->status with the cmd->lock, which is patch
> #1.  Patch #2 is the fix for the problem, which catches the case where we race
> with the timeout handler and the reply handler.  Thanks,
> 

Patches look ok and tested ok for me.

Reviewed-by: Mike Christie <mchristi@redhat.com>
Jens Axboe Oct. 25, 2019, 8:20 p.m. UTC | #2
On 10/21/19 1:56 PM, Josef Bacik wrote:
> We noticed a problem where NBD sometimes double completes the same request when
> things go wrong and we time out the request.  If the other side goes out to
> lunch but happens to reply just as we're timing out the requests we can end up
> with a double completion on the request.
> 
> We already keep track of the command status, we just need to make sure we
> protect all cases where we set cmd->status with the cmd->lock, which is patch
> #1.  Patch #2 is the fix for the problem, which catches the case where we race
> with the timeout handler and the reply handler.  Thanks,

Applied, thanks.