Message ID | 20220823145005.26356-1-suwan.kim027@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | virtio-blk: Fix WARN_ON_ONCE in virtio_queue_rq() | expand |
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > virtblk_unmap_data(req, vbr); > virtblk_cleanup_cmd(req); > rq_list_add(requeue_list, req); > + } else { > + blk_mq_start_request(req); > } The device may see new requests as soon as virtblk_add_req() is called above. Therefore the device may complete the request before blk_mq_start_request() is called. rq->io_start_time_ns = ktime_get_ns() will be after the request was actually submitted. I think blk_mq_start_request() needs to be called before virtblk_add_req(). Stefan
Hi Stefan, On Wed, Aug 24, 2022 at 5:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > > virtblk_unmap_data(req, vbr); > > virtblk_cleanup_cmd(req); > > rq_list_add(requeue_list, req); > > + } else { > > + blk_mq_start_request(req); > > } > > The device may see new requests as soon as virtblk_add_req() is called > above. Therefore the device may complete the request before > blk_mq_start_request() is called. > > rq->io_start_time_ns = ktime_get_ns() will be after the request was > actually submitted. > > I think blk_mq_start_request() needs to be called before > virtblk_add_req(). > But if blk_mq_start_request() is called before virtblk_add_req() and virtblk_add_req() fails, it can trigger WARN_ON_ONCE() at virtio_queue_rq(). With regard to the race condition between virtblk_add_req() and completion, I think that the race condition can not happen because virtblk_add_req() holds vq lock with irq saving and completion side (virtblk_done, virtblk_poll) need to acquire the vq lock also. Moreover, virtblk_done() is irq context so I think it can not be executed until virtblk_add_req() releases the lock. Regards, Suwan Kim
On Wed, Aug 24, 2022 at 10:16:10PM +0900, Kim Suwan wrote: > Hi Stefan, > > On Wed, Aug 24, 2022 at 5:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > > > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > > > virtblk_unmap_data(req, vbr); > > > virtblk_cleanup_cmd(req); > > > rq_list_add(requeue_list, req); > > > + } else { > > > + blk_mq_start_request(req); > > > } > > > > The device may see new requests as soon as virtblk_add_req() is called > > above. Therefore the device may complete the request before > > blk_mq_start_request() is called. > > > > rq->io_start_time_ns = ktime_get_ns() will be after the request was > > actually submitted. > > > > I think blk_mq_start_request() needs to be called before > > virtblk_add_req(). > > > > But if blk_mq_start_request() is called before virtblk_add_req() > and virtblk_add_req() fails, it can trigger WARN_ON_ONCE() at > virtio_queue_rq(). > > With regard to the race condition between virtblk_add_req() and > completion, I think that the race condition can not happen because > virtblk_add_req() holds vq lock with irq saving and completion side > (virtblk_done, virtblk_poll) need to acquire the vq lock also. > Moreover, virtblk_done() is irq context so I think it can not be > executed until virtblk_add_req() releases the lock. I agree there is no race condition regarding the ordering of blk_mq_start_request() and request completion. The spinlock prevents that and I wasn't concerned about that part. The issue is that the timestamp will be garbage. If we capture the timestamp during/after the request is executing, then the collected statistics will be wrong. Can you look for another solution that doesn't break the timestamp? FWIW I see that the rq->state state machine allows returning to the idle state once the request has been started: __blk_mq_requeue_request(). Stefan
On Thu, Aug 25, 2022 at 2:32 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > On Wed, Aug 24, 2022 at 10:16:10PM +0900, Kim Suwan wrote: > > Hi Stefan, > > > > On Wed, Aug 24, 2022 at 5:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > > > > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > > > > virtblk_unmap_data(req, vbr); > > > > virtblk_cleanup_cmd(req); > > > > rq_list_add(requeue_list, req); > > > > + } else { > > > > + blk_mq_start_request(req); > > > > } > > > > > > The device may see new requests as soon as virtblk_add_req() is called > > > above. Therefore the device may complete the request before > > > blk_mq_start_request() is called. > > > > > > rq->io_start_time_ns = ktime_get_ns() will be after the request was > > > actually submitted. > > > > > > I think blk_mq_start_request() needs to be called before > > > virtblk_add_req(). > > > > > > > But if blk_mq_start_request() is called before virtblk_add_req() > > and virtblk_add_req() fails, it can trigger WARN_ON_ONCE() at > > virtio_queue_rq(). > > > > With regard to the race condition between virtblk_add_req() and > > completion, I think that the race condition can not happen because > > virtblk_add_req() holds vq lock with irq saving and completion side > > (virtblk_done, virtblk_poll) need to acquire the vq lock also. > > Moreover, virtblk_done() is irq context so I think it can not be > > executed until virtblk_add_req() releases the lock. > > I agree there is no race condition regarding the ordering of > blk_mq_start_request() and request completion. The spinlock prevents > that and I wasn't concerned about that part. > > The issue is that the timestamp will be garbage. If we capture the > timestamp during/after the request is executing, then the collected > statistics will be wrong. > > Can you look for another solution that doesn't break the timestamp? > > FWIW I see that the rq->state state machine allows returning to the idle > state once the request has been started: __blk_mq_requeue_request(). I considered blk_mq_requeue_request() to handle error cases but I didn't use it because I think it can make the error path request processing slower than requeuing an error request to plug list again. But there doesn't seem to be any other option that doesn't break the timestamp. As you said, I will use __blk_mq_requeue_request() and send new patch soon. To Alexandre, I will share new diff soon. Could you please test one more time? Regards, Suwan Kim
On Thu, Aug 25, 2022 at 11:50 PM Suwan Kim <suwan.kim027@gmail.com> wrote: > > On Thu, Aug 25, 2022 at 2:32 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > On Wed, Aug 24, 2022 at 10:16:10PM +0900, Kim Suwan wrote: > > > Hi Stefan, > > > > > > On Wed, Aug 24, 2022 at 5:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > > > On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > > > > > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > > > > > virtblk_unmap_data(req, vbr); > > > > > virtblk_cleanup_cmd(req); > > > > > rq_list_add(requeue_list, req); > > > > > + } else { > > > > > + blk_mq_start_request(req); > > > > > } > > > > > > > > The device may see new requests as soon as virtblk_add_req() is called > > > > above. Therefore the device may complete the request before > > > > blk_mq_start_request() is called. > > > > > > > > rq->io_start_time_ns = ktime_get_ns() will be after the request was > > > > actually submitted. > > > > > > > > I think blk_mq_start_request() needs to be called before > > > > virtblk_add_req(). > > > > > > > > > > But if blk_mq_start_request() is called before virtblk_add_req() > > > and virtblk_add_req() fails, it can trigger WARN_ON_ONCE() at > > > virtio_queue_rq(). > > > > > > With regard to the race condition between virtblk_add_req() and > > > completion, I think that the race condition can not happen because > > > virtblk_add_req() holds vq lock with irq saving and completion side > > > (virtblk_done, virtblk_poll) need to acquire the vq lock also. > > > Moreover, virtblk_done() is irq context so I think it can not be > > > executed until virtblk_add_req() releases the lock. > > > > I agree there is no race condition regarding the ordering of > > blk_mq_start_request() and request completion. The spinlock prevents > > that and I wasn't concerned about that part. > > > > The issue is that the timestamp will be garbage. If we capture the > > timestamp during/after the request is executing, then the collected > > statistics will be wrong. > > > > Can you look for another solution that doesn't break the timestamp? > > > > FWIW I see that the rq->state state machine allows returning to the idle > > state once the request has been started: __blk_mq_requeue_request(). > > I considered blk_mq_requeue_request() to handle error cases but > I didn't use it because I think it can make the error path request > processing slower than requeuing an error request to plug list again. > > But there doesn't seem to be any other option that doesn't break > the timestamp. > > As you said, I will use __blk_mq_requeue_request() and send > new patch soon. > > To Alexandre, > > I will share new diff soon. Could you please test one more time? Absolutely! Thanks for looking into this.
On Fri, Aug 26, 2022 at 10:41:39AM +0900, Alexandre Courbot wrote: > On Thu, Aug 25, 2022 at 11:50 PM Suwan Kim <suwan.kim027@gmail.com> wrote: > > > > On Thu, Aug 25, 2022 at 2:32 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > On Wed, Aug 24, 2022 at 10:16:10PM +0900, Kim Suwan wrote: > > > > Hi Stefan, > > > > > > > > On Wed, Aug 24, 2022 at 5:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > > > > > On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > > > > > > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > > > > > > virtblk_unmap_data(req, vbr); > > > > > > virtblk_cleanup_cmd(req); > > > > > > rq_list_add(requeue_list, req); > > > > > > + } else { > > > > > > + blk_mq_start_request(req); > > > > > > } > > > > > > > > > > The device may see new requests as soon as virtblk_add_req() is called > > > > > above. Therefore the device may complete the request before > > > > > blk_mq_start_request() is called. > > > > > > > > > > rq->io_start_time_ns = ktime_get_ns() will be after the request was > > > > > actually submitted. > > > > > > > > > > I think blk_mq_start_request() needs to be called before > > > > > virtblk_add_req(). > > > > > > > > > > > > > But if blk_mq_start_request() is called before virtblk_add_req() > > > > and virtblk_add_req() fails, it can trigger WARN_ON_ONCE() at > > > > virtio_queue_rq(). > > > > > > > > With regard to the race condition between virtblk_add_req() and > > > > completion, I think that the race condition can not happen because > > > > virtblk_add_req() holds vq lock with irq saving and completion side > > > > (virtblk_done, virtblk_poll) need to acquire the vq lock also. > > > > Moreover, virtblk_done() is irq context so I think it can not be > > > > executed until virtblk_add_req() releases the lock. > > > > > > I agree there is no race condition regarding the ordering of > > > blk_mq_start_request() and request completion. The spinlock prevents > > > that and I wasn't concerned about that part. > > > > > > The issue is that the timestamp will be garbage. If we capture the > > > timestamp during/after the request is executing, then the collected > > > statistics will be wrong. > > > > > > Can you look for another solution that doesn't break the timestamp? > > > > > > FWIW I see that the rq->state state machine allows returning to the idle > > > state once the request has been started: __blk_mq_requeue_request(). > > > > I considered blk_mq_requeue_request() to handle error cases but > > I didn't use it because I think it can make the error path request > > processing slower than requeuing an error request to plug list again. > > > > But there doesn't seem to be any other option that doesn't break > > the timestamp. > > > > As you said, I will use __blk_mq_requeue_request() and send > > new patch soon. > > > > To Alexandre, > > > > I will share new diff soon. Could you please test one more time? > > Absolutely! Thanks for looking into this. Hi Alexandre, Could you test this path? If it works, I will send v2 patch. Regards, Suwan Kim --- diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 30255fcaf181..dd9a05174726 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -322,14 +322,14 @@ static blk_status_t virtblk_prep_rq(struct blk_mq_hw_ctx *hctx, if (unlikely(status)) return status; - blk_mq_start_request(req); - vbr->sg_table.nents = virtblk_map_data(hctx, req, vbr); if (unlikely(vbr->sg_table.nents < 0)) { virtblk_cleanup_cmd(req); return BLK_STS_RESOURCE; } + blk_mq_start_request(req); + return BLK_STS_OK; } @@ -391,8 +391,7 @@ static bool virtblk_prep_rq_batch(struct request *req) } static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, - struct request **rqlist, - struct request **requeue_list) + struct request **rqlist) { unsigned long flags; int err; @@ -408,7 +407,7 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, if (err) { virtblk_unmap_data(req, vbr); virtblk_cleanup_cmd(req); - rq_list_add(requeue_list, req); + blk_mq_requeue_request(req, true); } } @@ -436,7 +435,7 @@ static void virtio_queue_rqs(struct request **rqlist) if (!next || req->mq_hctx != next->mq_hctx) { req->rq_next = NULL; - kick = virtblk_add_req_batch(vq, rqlist, &requeue_list); + kick = virtblk_add_req_batch(vq, rqlist); if (kick) virtqueue_notify(vq->vq);
Hi Suwan, On Mon, Aug 29, 2022 at 11:49 AM Suwan Kim <suwan.kim027@gmail.com> wrote: > > On Fri, Aug 26, 2022 at 10:41:39AM +0900, Alexandre Courbot wrote: > > On Thu, Aug 25, 2022 at 11:50 PM Suwan Kim <suwan.kim027@gmail.com> wrote: > > > > > > On Thu, Aug 25, 2022 at 2:32 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > > > On Wed, Aug 24, 2022 at 10:16:10PM +0900, Kim Suwan wrote: > > > > > Hi Stefan, > > > > > > > > > > On Wed, Aug 24, 2022 at 5:56 AM Stefan Hajnoczi <stefanha@redhat.com> wrote: > > > > > > > > > > > > On Tue, Aug 23, 2022 at 11:50:05PM +0900, Suwan Kim wrote: > > > > > > > @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, > > > > > > > virtblk_unmap_data(req, vbr); > > > > > > > virtblk_cleanup_cmd(req); > > > > > > > rq_list_add(requeue_list, req); > > > > > > > + } else { > > > > > > > + blk_mq_start_request(req); > > > > > > > } > > > > > > > > > > > > The device may see new requests as soon as virtblk_add_req() is called > > > > > > above. Therefore the device may complete the request before > > > > > > blk_mq_start_request() is called. > > > > > > > > > > > > rq->io_start_time_ns = ktime_get_ns() will be after the request was > > > > > > actually submitted. > > > > > > > > > > > > I think blk_mq_start_request() needs to be called before > > > > > > virtblk_add_req(). > > > > > > > > > > > > > > > > But if blk_mq_start_request() is called before virtblk_add_req() > > > > > and virtblk_add_req() fails, it can trigger WARN_ON_ONCE() at > > > > > virtio_queue_rq(). > > > > > > > > > > With regard to the race condition between virtblk_add_req() and > > > > > completion, I think that the race condition can not happen because > > > > > virtblk_add_req() holds vq lock with irq saving and completion side > > > > > (virtblk_done, virtblk_poll) need to acquire the vq lock also. > > > > > Moreover, virtblk_done() is irq context so I think it can not be > > > > > executed until virtblk_add_req() releases the lock. > > > > > > > > I agree there is no race condition regarding the ordering of > > > > blk_mq_start_request() and request completion. The spinlock prevents > > > > that and I wasn't concerned about that part. > > > > > > > > The issue is that the timestamp will be garbage. If we capture the > > > > timestamp during/after the request is executing, then the collected > > > > statistics will be wrong. > > > > > > > > Can you look for another solution that doesn't break the timestamp? > > > > > > > > FWIW I see that the rq->state state machine allows returning to the idle > > > > state once the request has been started: __blk_mq_requeue_request(). > > > > > > I considered blk_mq_requeue_request() to handle error cases but > > > I didn't use it because I think it can make the error path request > > > processing slower than requeuing an error request to plug list again. > > > > > > But there doesn't seem to be any other option that doesn't break > > > the timestamp. > > > > > > As you said, I will use __blk_mq_requeue_request() and send > > > new patch soon. > > > > > > To Alexandre, > > > > > > I will share new diff soon. Could you please test one more time? > > > > Absolutely! Thanks for looking into this. > > Hi Alexandre, > > Could you test this path? > If it works, I will send v2 patch. This version is working fine for me! Cheers, Alex.
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 30255fcaf181..73a0620a7cff 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -322,8 +322,6 @@ static blk_status_t virtblk_prep_rq(struct blk_mq_hw_ctx *hctx, if (unlikely(status)) return status; - blk_mq_start_request(req); - vbr->sg_table.nents = virtblk_map_data(hctx, req, vbr); if (unlikely(vbr->sg_table.nents < 0)) { virtblk_cleanup_cmd(req); @@ -349,6 +347,8 @@ static blk_status_t virtio_queue_rq(struct blk_mq_hw_ctx *hctx, if (unlikely(status)) return status; + blk_mq_start_request(req); + spin_lock_irqsave(&vblk->vqs[qid].lock, flags); err = virtblk_add_req(vblk->vqs[qid].vq, vbr); if (err) { @@ -409,6 +409,8 @@ static bool virtblk_add_req_batch(struct virtio_blk_vq *vq, virtblk_unmap_data(req, vbr); virtblk_cleanup_cmd(req); rq_list_add(requeue_list, req); + } else { + blk_mq_start_request(req); } }