diff mbox series

block/io_uring: resubmit when result is -EAGAIN

Message ID 20210728103518.1221195-1-f.ebner@proxmox.com (mailing list archive)
State New, archived
Headers show
Series block/io_uring: resubmit when result is -EAGAIN | expand

Commit Message

Fiona Ebner July 28, 2021, 10:35 a.m. UTC
Quoting from [0]:

 Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
 completion path. Normally we expect this to happen inline as part
 of submission, but apparently SCSI has a weird corner case where it
 can happen as part of normal completions.

Host kernels without patch [0] can panic when this happens [1], and
resubmitting makes the panic more likely. On the other hand, for
kernels with patch [0], resubmitting ensures that a block job is not
aborted just because of such spurious errors.

[0]: https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u

[1]:
  #9 [ffffb732000c8b70] asm_exc_page_fault at ffffffffa4800ade
 #10 [ffffb732000c8bf8] io_prep_async_work at ffffffffa3d89c16
 #11 [ffffb732000c8c50] io_rw_reissue at ffffffffa3d8b2e1
 #12 [ffffb732000c8c78] io_complete_rw at ffffffffa3d8baa8
 #13 [ffffb732000c8c98] blkdev_bio_end_io at ffffffffa3d62a80
 #14 [ffffb732000c8cc8] bio_endio at ffffffffa3f4e800
 #15 [ffffb732000c8ce8] dec_pending at ffffffffa432f854
 #16 [ffffb732000c8d30] clone_endio at ffffffffa433170c
 #17 [ffffb732000c8d70] bio_endio at ffffffffa3f4e800
 #18 [ffffb732000c8d90] blk_update_request at ffffffffa3f53a37
 #19 [ffffb732000c8dd0] scsi_end_request at ffffffffa4233a5c
 #20 [ffffb732000c8e08] scsi_io_completion at ffffffffa423432c
 #21 [ffffb732000c8e58] scsi_finish_command at ffffffffa422c527
 #22 [ffffb732000c8e88] scsi_softirq_done at ffffffffa42341e4

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---

I'm new to this code and io_uring, so I don't know what all the
implications are, but retrying upon EAGAIN does not sound like
a bad thing to my inexperienced ears.

Some more context, leading up to this patch:

We had some users reporting issues after we switched to using io_uring
by default. Namely, kernel panics [2] for some, and failing block jobs
[3] (with a custom backup mechanism we implemented on top of QEMU's
block layer) for others.

I had luck and managed to reprouce the issue, and it was a failed
block job about half of the time and a kernel panic the other half.
When using a host kernel with [0], it's a failed block job all the
time, and this patch attempts to fix that, by resubmitting instead
of bubbling up the error.

[2]: https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/post-404382
[3]: https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/

 block/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Philippe Mathieu-Daudé July 28, 2021, 12:09 p.m. UTC | #1
Cc'ing the maintainers for you. See
https://wiki.qemu.org/Contribute/SubmitAPatch#CC_the_relevant_maintainer

$ ./scripts/get_maintainer.pl -f block/io_uring.c
Aarushi Mehta <mehta.aaru20@gmail.com> (maintainer:Linux io_uring)
Julia Suvorova <jusual@redhat.com> (maintainer:Linux io_uring)
Stefan Hajnoczi <stefanha@redhat.com> (maintainer:Linux io_uring)
Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
qemu-block@nongnu.org (open list:Linux io_uring)
qemu-devel@nongnu.org (open list:All patches CC here)

Also Cc'ing Stefano for commit b4e44c9944e ("io_uring: retry
io_uring_submit() if it fails with errno=EINTR").
(Stefano, you might want to add yourself a R: tag in MAINTAINERS).

On 7/28/21 12:35 PM, Fabian Ebner wrote:
> Quoting from [0]:
> 
>  Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>  completion path. Normally we expect this to happen inline as part
>  of submission, but apparently SCSI has a weird corner case where it
>  can happen as part of normal completions.
> 
> Host kernels without patch [0] can panic when this happens [1], and
> resubmitting makes the panic more likely. On the other hand, for
> kernels with patch [0], resubmitting ensures that a block job is not
> aborted just because of such spurious errors.
> 
> [0]: https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
> 
> [1]:
>   #9 [ffffb732000c8b70] asm_exc_page_fault at ffffffffa4800ade
>  #10 [ffffb732000c8bf8] io_prep_async_work at ffffffffa3d89c16
>  #11 [ffffb732000c8c50] io_rw_reissue at ffffffffa3d8b2e1
>  #12 [ffffb732000c8c78] io_complete_rw at ffffffffa3d8baa8
>  #13 [ffffb732000c8c98] blkdev_bio_end_io at ffffffffa3d62a80
>  #14 [ffffb732000c8cc8] bio_endio at ffffffffa3f4e800
>  #15 [ffffb732000c8ce8] dec_pending at ffffffffa432f854
>  #16 [ffffb732000c8d30] clone_endio at ffffffffa433170c
>  #17 [ffffb732000c8d70] bio_endio at ffffffffa3f4e800
>  #18 [ffffb732000c8d90] blk_update_request at ffffffffa3f53a37
>  #19 [ffffb732000c8dd0] scsi_end_request at ffffffffa4233a5c
>  #20 [ffffb732000c8e08] scsi_io_completion at ffffffffa423432c
>  #21 [ffffb732000c8e58] scsi_finish_command at ffffffffa422c527
>  #22 [ffffb732000c8e88] scsi_softirq_done at ffffffffa42341e4
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> I'm new to this code and io_uring, so I don't know what all the
> implications are, but retrying upon EAGAIN does not sound like
> a bad thing to my inexperienced ears.
> 
> Some more context, leading up to this patch:
> 
> We had some users reporting issues after we switched to using io_uring
> by default. Namely, kernel panics [2] for some, and failing block jobs
> [3] (with a custom backup mechanism we implemented on top of QEMU's
> block layer) for others.
> 
> I had luck and managed to reprouce the issue, and it was a failed
> block job about half of the time and a kernel panic the other half.
> When using a host kernel with [0], it's a failed block job all the
> time, and this patch attempts to fix that, by resubmitting instead
> of bubbling up the error.
> 
> [2]: https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/post-404382
> [3]: https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/
> 
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 00a3ee9fb8..77d162cb24 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -165,7 +165,7 @@ static void luring_process_completions(LuringState *s)
>          total_bytes = ret + luringcb->total_read;
>  
>          if (ret < 0) {
> -            if (ret == -EINTR) {
> +            if (ret == -EINTR || ret == -EAGAIN) {
>                  luring_resubmit(s, luringcb);
>                  continue;
>              }
>
Stefano Garzarella July 28, 2021, 1:02 p.m. UTC | #2
On Wed, Jul 28, 2021 at 02:09:40PM +0200, Philippe Mathieu-Daudé wrote:
>Cc'ing the maintainers for you. See
>https://wiki.qemu.org/Contribute/SubmitAPatch#CC_the_relevant_maintainer
>
>$ ./scripts/get_maintainer.pl -f block/io_uring.c
>Aarushi Mehta <mehta.aaru20@gmail.com> (maintainer:Linux io_uring)
>Julia Suvorova <jusual@redhat.com> (maintainer:Linux io_uring)
>Stefan Hajnoczi <stefanha@redhat.com> (maintainer:Linux io_uring)
>Kevin Wolf <kwolf@redhat.com> (supporter:Block layer core)
>Max Reitz <mreitz@redhat.com> (supporter:Block layer core)
>qemu-block@nongnu.org (open list:Linux io_uring)
>qemu-devel@nongnu.org (open list:All patches CC here)
>
>Also Cc'ing Stefano for commit b4e44c9944e ("io_uring: retry
>io_uring_submit() if it fails with errno=EINTR").

Thanks Phil!

>(Stefano, you might want to add yourself a R: tag in MAINTAINERS).

Yep, I'll send a patch for that.

>
>On 7/28/21 12:35 PM, Fabian Ebner wrote:
>> Quoting from [0]:
>>
>>  Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>>  completion path. Normally we expect this to happen inline as part
>>  of submission, but apparently SCSI has a weird corner case where it
>>  can happen as part of normal completions.
>>
>> Host kernels without patch [0] can panic when this happens [1], and
>> resubmitting makes the panic more likely. On the other hand, for
>> kernels with patch [0], resubmitting ensures that a block job is not
>> aborted just because of such spurious errors.
>>
>> [0]: https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
>>
>> [1]:
>>   #9 [ffffb732000c8b70] asm_exc_page_fault at ffffffffa4800ade
>>  #10 [ffffb732000c8bf8] io_prep_async_work at ffffffffa3d89c16
>>  #11 [ffffb732000c8c50] io_rw_reissue at ffffffffa3d8b2e1
>>  #12 [ffffb732000c8c78] io_complete_rw at ffffffffa3d8baa8
>>  #13 [ffffb732000c8c98] blkdev_bio_end_io at ffffffffa3d62a80
>>  #14 [ffffb732000c8cc8] bio_endio at ffffffffa3f4e800
>>  #15 [ffffb732000c8ce8] dec_pending at ffffffffa432f854
>>  #16 [ffffb732000c8d30] clone_endio at ffffffffa433170c
>>  #17 [ffffb732000c8d70] bio_endio at ffffffffa3f4e800
>>  #18 [ffffb732000c8d90] blk_update_request at ffffffffa3f53a37
>>  #19 [ffffb732000c8dd0] scsi_end_request at ffffffffa4233a5c
>>  #20 [ffffb732000c8e08] scsi_io_completion at ffffffffa423432c
>>  #21 [ffffb732000c8e58] scsi_finish_command at ffffffffa422c527
>>  #22 [ffffb732000c8e88] scsi_softirq_done at ffffffffa42341e4
>>
>> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
>> ---
>>
>> I'm new to this code and io_uring, so I don't know what all the
>> implications are, but retrying upon EAGAIN does not sound like
>> a bad thing to my inexperienced ears.

Yeah, that doesn't sound bad.

For kernels that don't have the patch applied, I don't think there's 
much we can do about it, so this change seems okay to me:

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Stefan Hajnoczi July 28, 2021, 2:14 p.m. UTC | #3
On Wed, Jul 28, 2021 at 12:35:18PM +0200, Fabian Ebner wrote:
> Quoting from [0]:
> 
>  Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>  completion path. Normally we expect this to happen inline as part
>  of submission, but apparently SCSI has a weird corner case where it
>  can happen as part of normal completions.
> 
> Host kernels without patch [0] can panic when this happens [1], and
> resubmitting makes the panic more likely. On the other hand, for
> kernels with patch [0], resubmitting ensures that a block job is not
> aborted just because of such spurious errors.
> 
> [0]: https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
> 
> [1]:
>   #9 [ffffb732000c8b70] asm_exc_page_fault at ffffffffa4800ade
>  #10 [ffffb732000c8bf8] io_prep_async_work at ffffffffa3d89c16
>  #11 [ffffb732000c8c50] io_rw_reissue at ffffffffa3d8b2e1
>  #12 [ffffb732000c8c78] io_complete_rw at ffffffffa3d8baa8
>  #13 [ffffb732000c8c98] blkdev_bio_end_io at ffffffffa3d62a80
>  #14 [ffffb732000c8cc8] bio_endio at ffffffffa3f4e800
>  #15 [ffffb732000c8ce8] dec_pending at ffffffffa432f854
>  #16 [ffffb732000c8d30] clone_endio at ffffffffa433170c
>  #17 [ffffb732000c8d70] bio_endio at ffffffffa3f4e800
>  #18 [ffffb732000c8d90] blk_update_request at ffffffffa3f53a37
>  #19 [ffffb732000c8dd0] scsi_end_request at ffffffffa4233a5c
>  #20 [ffffb732000c8e08] scsi_io_completion at ffffffffa423432c
>  #21 [ffffb732000c8e58] scsi_finish_command at ffffffffa422c527
>  #22 [ffffb732000c8e88] scsi_softirq_done at ffffffffa42341e4
> 
> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
> ---
> 
> I'm new to this code and io_uring, so I don't know what all the
> implications are, but retrying upon EAGAIN does not sound like
> a bad thing to my inexperienced ears.
> 
> Some more context, leading up to this patch:
> 
> We had some users reporting issues after we switched to using io_uring
> by default. Namely, kernel panics [2] for some, and failing block jobs
> [3] (with a custom backup mechanism we implemented on top of QEMU's
> block layer) for others.
> 
> I had luck and managed to reprouce the issue, and it was a failed
> block job about half of the time and a kernel panic the other half.
> When using a host kernel with [0], it's a failed block job all the
> time, and this patch attempts to fix that, by resubmitting instead
> of bubbling up the error.

Great, thanks for the patch!

Some of the relevant information is below the '---' line and won't be
included in the git log. Please tweak the commit description to focus
on:
- io_uring may return spurious -EAGAIN with Linux SCSI
  (The details of how io_uring internally resubmits are not important
  but the fact that spurious -EAGAIN is visible to userspace matters.)
- Resubmitting such requests solves the problem

The kernel panic is not as important in the commit description since
this patch works the same regardless of whether the kernel panics
sometimes or not.

> 
> [2]: https://forum.proxmox.com/threads/kernel-panic-whole-server-crashes-about-every-day.91803/post-404382
> [3]: https://forum.proxmox.com/threads/backup-job-failed-with-err-11-on-2-of-6-vms.92568/
> 
>  block/io_uring.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 00a3ee9fb8..77d162cb24 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -165,7 +165,7 @@ static void luring_process_completions(LuringState *s)
>          total_bytes = ret + luringcb->total_read;
>  
>          if (ret < 0) {
> -            if (ret == -EINTR) {
> +            if (ret == -EINTR || ret == -EAGAIN) {
>                  luring_resubmit(s, luringcb);
>                  continue;
>              }

Please add a comment:

 /*
  * Only writev/readv/fsync requests on regular files or host block
  * devices are submitted. Therefore -EAGAIN is not expected but it's
  * known to happen sometimes with Linux SCSI. Submit again and hope the
  * request completes successfully.
  *
  * For more information, see:
  * https://lore.kernel.org/io-uring/20210727165811.284510-3-axboe@kernel.dk/T/#u
  *
  * If the code is changed to submit other types of requests in the
  * future, then this workaround may need to be extended to deal with
  * genuine -EAGAIN results that should not be resubmitted immediately.
  */

Stefan
diff mbox series

Patch

diff --git a/block/io_uring.c b/block/io_uring.c
index 00a3ee9fb8..77d162cb24 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -165,7 +165,7 @@  static void luring_process_completions(LuringState *s)
         total_bytes = ret + luringcb->total_read;
 
         if (ret < 0) {
-            if (ret == -EINTR) {
+            if (ret == -EINTR || ret == -EAGAIN) {
                 luring_resubmit(s, luringcb);
                 continue;
             }