Message ID | 20180615174729.20544-1-naravamudan@digitalocean.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > laio_init() can fail for a couple of reasons, which will lead to a NULL > pointer dereference in laio_attach_aio_context(). > > To solve this, add a aio_setup_linux_aio() function which is called > before aio_get_linux_aio() where it is called currently, and which > propogates setup errors up. The signature of aio_get_linux_aio() was not s/propogates/propagates/ > modified, because it seems preferable to return the actual errno from > the possible failing initialization calls. > > With respect to the error-handling in the file-posix.c, we properly > bubble any errors up in raw_co_prw and in the case s of > raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not > set (but there is no bubbling up). In all three cases, if the setup > function fails, we fallback to the thread pool and an error message is > emitted. > > It is trivial to make qemu segfault in my testing. Set > /proc/sys/fs/aio-max-nr to 0 and start a guest with > aio=native,cache=directsync. With this patch, the guest successfully > starts (but obviously isn't using native AIO). Setting aio-max-nr back > up to a reasonable value, AIO contexts are consumed normally. > > Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> > > --- > > Changes from v1 -> v2: When posting a v2, it's best to post as a new thread, rather than in-reply-to the v1 thread, so that automated tooling knows to check the new patch. More patch submission tips at https://wiki.qemu.org/Contribute/SubmitAPatch > > Rather than affect virtio-scsi/blk at all, make all the changes internal > to file-posix.c. Thanks to Kevin Wolf for the suggested change. > --- > block/file-posix.c | 24 ++++++++++++++++++++++++ > block/linux-aio.c | 15 ++++++++++----- > include/block/aio.h | 3 +++ > include/block/raw-aio.h | 2 +- > stubs/linux-aio.c | 2 +- > util/async.c | 15 ++++++++++++--- > 6 files changed, 51 insertions(+), 10 deletions(-) > > diff --git a/block/file-posix.c b/block/file-posix.c > index 07bb061fe4..2415d09bf1 100644 > --- a/block/file-posix.c > +++ b/block/file-posix.c > @@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, > type |= QEMU_AIO_MISALIGNED; > #ifdef CONFIG_LINUX_AIO > } else if (s->use_linux_aio) { > + int rc; > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > + if (rc != 0) { > + error_report("Unable to use native AIO, falling back to " > + "thread pool."); In general, error_report() should not output a trailing '.'. > + s->use_linux_aio = 0; > + return rc; Wait - the message claims we are falling back, but the non-zero return code sounds like we are returning an error instead of falling back. (My preference - if the user requested something and we can't do it, it's better to error than to fall back to something that does not match the user's request). > + } > LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); > assert(qiov->size == bytes); > return laio_co_submit(bs, aio, s->fd, offset, qiov, type); > @@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs) > #ifdef CONFIG_LINUX_AIO > BDRVRawState *s = bs->opaque; > if (s->use_linux_aio) { > + int rc; > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > + if (rc != 0) { > + error_report("Unable to use native AIO, falling back to " > + "thread pool."); > + s->use_linux_aio = 0; Should s->use_linux_aio be a bool instead of an int?
On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > laio_init() can fail for a couple of reasons, which will lead to a NULL > > pointer dereference in laio_attach_aio_context(). > > > > To solve this, add a aio_setup_linux_aio() function which is called > > before aio_get_linux_aio() where it is called currently, and which > > propogates setup errors up. The signature of aio_get_linux_aio() was not > > s/propogates/propagates/ Thanks! > > modified, because it seems preferable to return the actual errno from > > the possible failing initialization calls. > > > > With respect to the error-handling in the file-posix.c, we properly > > bubble any errors up in raw_co_prw and in the case s of > > raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not > > set (but there is no bubbling up). In all three cases, if the setup > > function fails, we fallback to the thread pool and an error message is > > emitted. > > > > It is trivial to make qemu segfault in my testing. Set > > /proc/sys/fs/aio-max-nr to 0 and start a guest with > > aio=native,cache=directsync. With this patch, the guest successfully > > starts (but obviously isn't using native AIO). Setting aio-max-nr back > > up to a reasonable value, AIO contexts are consumed normally. > > > > Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> > > > > --- > > > > Changes from v1 -> v2: > > When posting a v2, it's best to post as a new thread, rather than > in-reply-to the v1 thread, so that automated tooling knows to check the new > patch. More patch submission tips at > https://wiki.qemu.org/Contribute/SubmitAPatch My apologies! I'll fix this in a (future) v3. > > Rather than affect virtio-scsi/blk at all, make all the changes internal > > to file-posix.c. Thanks to Kevin Wolf for the suggested change. > > --- > > block/file-posix.c | 24 ++++++++++++++++++++++++ > > block/linux-aio.c | 15 ++++++++++----- > > include/block/aio.h | 3 +++ > > include/block/raw-aio.h | 2 +- > > stubs/linux-aio.c | 2 +- > > util/async.c | 15 ++++++++++++--- > > 6 files changed, 51 insertions(+), 10 deletions(-) > > > > diff --git a/block/file-posix.c b/block/file-posix.c > > index 07bb061fe4..2415d09bf1 100644 > > --- a/block/file-posix.c > > +++ b/block/file-posix.c > > @@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, > > type |= QEMU_AIO_MISALIGNED; > > #ifdef CONFIG_LINUX_AIO > > } else if (s->use_linux_aio) { > > + int rc; > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > + if (rc != 0) { > > + error_report("Unable to use native AIO, falling back to " > > + "thread pool."); > > In general, error_report() should not output a trailing '.'. Will fix. > > + s->use_linux_aio = 0; > > + return rc; > > Wait - the message claims we are falling back, but the non-zero return code > sounds like we are returning an error instead of falling back. (My > preference - if the user requested something and we can't do it, it's better > to error than to fall back to something that does not match the user's > request). I think that makes sense, I hadn't tested this specific case (in my reading of the code, it wasn't clear to me if raw_co_prw() could be called before raw_aio_plug() had been called, but I think returning the error code up should be handled correctly. What about the cases where there is no error handling (the other two changes in the patch)? > > + } > > LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); > > assert(qiov->size == bytes); > > return laio_co_submit(bs, aio, s->fd, offset, qiov, type); > > @@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs) > > #ifdef CONFIG_LINUX_AIO > > BDRVRawState *s = bs->opaque; > > if (s->use_linux_aio) { > > + int rc; > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > + if (rc != 0) { > > + error_report("Unable to use native AIO, falling back to " > > + "thread pool."); > > + s->use_linux_aio = 0; > > Should s->use_linux_aio be a bool instead of an int? It is: bool use_linux_aio:1; would you prefer I did a preparatory patch that converted users to true/false? Thanks, Nish
On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: <snip> > > > } else if (s->use_linux_aio) { > > > + int rc; > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > + if (rc != 0) { > > > + error_report("Unable to use native AIO, falling back to " > > > + "thread pool."); > > > > In general, error_report() should not output a trailing '.'. > > Will fix. > > > > + s->use_linux_aio = 0; > > > + return rc; > > > > Wait - the message claims we are falling back, but the non-zero return code > > sounds like we are returning an error instead of falling back. (My > > preference - if the user requested something and we can't do it, it's better > > to error than to fall back to something that does not match the user's > > request). > > I think that makes sense, I hadn't tested this specific case (in my > reading of the code, it wasn't clear to me if raw_co_prw() could be > called before raw_aio_plug() had been called, but I think returning the > error code up should be handled correctly. What about the cases where > there is no error handling (the other two changes in the patch)? While looking at doing these changes, I realized that I'm not quite sure what the right approach is here. My original rationale for returning non-zero was that AIO was requested but could not be completed. I haven't fully tracked back the calling paths, but I assumed it would get retried at the top level, and since we indicated to not use AIO on subsequent calls, it will succeed and use threads then (note, that I do now realize this means a mismatch between the qemu command-line and the in-use AIO model). In practice, with my v2 patch, where I do return a non-zero error-code from this function, qemu does not exit (nor is any logging other than that I added emitted on the monitor). If I do not fallback, I imagine we would just continuously see this error message and IO might not actually every occur? Reworking all of the callpath to fail on non-zero returns from raw_co_prw() seems like a fair bit of work, but if that is what is being requested, I can try that (it will just take a while). Alternatively, I can produce a v3 quickly that does not bubble the actual errno all the way up (since it does seem like it is ignored anyways?). <snip> > > > + s->use_linux_aio = 0; > > > > Should s->use_linux_aio be a bool instead of an int? > > It is: > > bool use_linux_aio:1; > > would you prefer I did a preparatory patch that converted users to > true/false? Sorry, I misunderstood this -- only my patch does an assignment, so I'll switch to 'false'. Thanks, Nish
On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > <snip> > > > > > } else if (s->use_linux_aio) { > > > > + int rc; > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > + if (rc != 0) { > > > > + error_report("Unable to use native AIO, falling back to " > > > > + "thread pool."); > > > > > > In general, error_report() should not output a trailing '.'. > > > > Will fix. > > > > > > + s->use_linux_aio = 0; > > > > + return rc; > > > > > > Wait - the message claims we are falling back, but the non-zero return code > > > sounds like we are returning an error instead of falling back. (My > > > preference - if the user requested something and we can't do it, it's better > > > to error than to fall back to something that does not match the user's > > > request). > > > > I think that makes sense, I hadn't tested this specific case (in my > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > called before raw_aio_plug() had been called, but I think returning the > > error code up should be handled correctly. What about the cases where > > there is no error handling (the other two changes in the patch)? > > While looking at doing these changes, I realized that I'm not quite sure > what the right approach is here. My original rationale for returning > non-zero was that AIO was requested but could not be completed. I > haven't fully tracked back the calling paths, but I assumed it would get > retried at the top level, and since we indicated to not use AIO on > subsequent calls, it will succeed and use threads then (note, that I do > now realize this means a mismatch between the qemu command-line and the > in-use AIO model). > > In practice, with my v2 patch, where I do return a non-zero error-code > from this function, qemu does not exit (nor is any logging other than > that I added emitted on the monitor). If I do not fallback, I imagine we > would just continuously see this error message and IO might not actually > every occur? Reworking all of the callpath to fail on non-zero returns > from raw_co_prw() seems like a fair bit of work, but if that is what is > being requested, I can try that (it will just take a while). > Alternatively, I can produce a v3 quickly that does not bubble the > actual errno all the way up (since it does seem like it is ignored > anyways?). Sorry for the noise, but I had one more thought. Would it be appropriate to push the _setup() call up to when we parse the arguments about aio=native? E.g., we already check there if cache=directsync is specified and error out if not. We could, in theory, also call laio_init() there (via the new function) and error out to the CLI if that fails. Then the runtime paths would simply be able to use the context that was setup earlier? I would need to verify the laio_cleanup() happens correctly still. Thanks, Nish
Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben: > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > > <snip> > > > > > > > } else if (s->use_linux_aio) { > > > > > + int rc; > > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > > + if (rc != 0) { > > > > > + error_report("Unable to use native AIO, falling back to " > > > > > + "thread pool."); > > > > > > > > In general, error_report() should not output a trailing '.'. > > > > > > Will fix. > > > > > > > > + s->use_linux_aio = 0; > > > > > + return rc; > > > > > > > > Wait - the message claims we are falling back, but the non-zero return code > > > > sounds like we are returning an error instead of falling back. (My > > > > preference - if the user requested something and we can't do it, it's better > > > > to error than to fall back to something that does not match the user's > > > > request). > > > > > > I think that makes sense, I hadn't tested this specific case (in my > > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > > called before raw_aio_plug() had been called, but I think returning the > > > error code up should be handled correctly. What about the cases where > > > there is no error handling (the other two changes in the patch)? > > > > While looking at doing these changes, I realized that I'm not quite sure > > what the right approach is here. My original rationale for returning > > non-zero was that AIO was requested but could not be completed. I > > haven't fully tracked back the calling paths, but I assumed it would get > > retried at the top level, and since we indicated to not use AIO on > > subsequent calls, it will succeed and use threads then (note, that I do > > now realize this means a mismatch between the qemu command-line and the > > in-use AIO model). > > > > In practice, with my v2 patch, where I do return a non-zero error-code > > from this function, qemu does not exit (nor is any logging other than > > that I added emitted on the monitor). If I do not fallback, I imagine we > > would just continuously see this error message and IO might not actually > > every occur? Reworking all of the callpath to fail on non-zero returns > > from raw_co_prw() seems like a fair bit of work, but if that is what is > > being requested, I can try that (it will just take a while). > > Alternatively, I can produce a v3 quickly that does not bubble the > > actual errno all the way up (since it does seem like it is ignored > > anyways?). > > Sorry for the noise, but I had one more thought. Would it be appropriate > to push the _setup() call up to when we parse the arguments about > aio=native? E.g., we already check there if cache=directsync is > specified and error out if not. We already do this: /* Currently Linux does AIO only for files opened with O_DIRECT */ if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) { error_setg(errp, "aio=native was specified, but it requires " "cache.direct=on, which was not specified."); ret = -EINVAL; goto fail; } laio_init() is about other types of errors. But anyway, yes, calling laio_init() already in .bdrv_open() is possible. Returning errors from .bdrv_open() is nice and easy and we should do it. However, we may also need to call laio_init() again when switching to a different I/O thread after the image is already opened. This is what I meant when I commented on v1 that you should do this in the .bdrv_attach_aio_context callback. The problem here is that we can't return an error there and the guest is already using the image. In this case, logging an error and falling back to the thread pool seems to be the best option we have. Kevin
On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote: > Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben: > > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > > > > <snip> > > > > > > > > > } else if (s->use_linux_aio) { > > > > > > + int rc; > > > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > > > + if (rc != 0) { > > > > > > + error_report("Unable to use native AIO, falling back to " > > > > > > + "thread pool."); > > > > > > > > > > In general, error_report() should not output a trailing '.'. > > > > > > > > Will fix. > > > > > > > > > > + s->use_linux_aio = 0; > > > > > > + return rc; > > > > > > > > > > Wait - the message claims we are falling back, but the non-zero return code > > > > > sounds like we are returning an error instead of falling back. (My > > > > > preference - if the user requested something and we can't do it, it's better > > > > > to error than to fall back to something that does not match the user's > > > > > request). > > > > > > > > I think that makes sense, I hadn't tested this specific case (in my > > > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > > > called before raw_aio_plug() had been called, but I think returning the > > > > error code up should be handled correctly. What about the cases where > > > > there is no error handling (the other two changes in the patch)? > > > > > > While looking at doing these changes, I realized that I'm not quite sure > > > what the right approach is here. My original rationale for returning > > > non-zero was that AIO was requested but could not be completed. I > > > haven't fully tracked back the calling paths, but I assumed it would get > > > retried at the top level, and since we indicated to not use AIO on > > > subsequent calls, it will succeed and use threads then (note, that I do > > > now realize this means a mismatch between the qemu command-line and the > > > in-use AIO model). > > > > > > In practice, with my v2 patch, where I do return a non-zero error-code > > > from this function, qemu does not exit (nor is any logging other than > > > that I added emitted on the monitor). If I do not fallback, I imagine we > > > would just continuously see this error message and IO might not actually > > > every occur? Reworking all of the callpath to fail on non-zero returns > > > from raw_co_prw() seems like a fair bit of work, but if that is what is > > > being requested, I can try that (it will just take a while). > > > Alternatively, I can produce a v3 quickly that does not bubble the > > > actual errno all the way up (since it does seem like it is ignored > > > anyways?). > > > > Sorry for the noise, but I had one more thought. Would it be appropriate > > to push the _setup() call up to when we parse the arguments about > > aio=native? E.g., we already check there if cache=directsync is > > specified and error out if not. > > We already do this: Right, I stated above it already is done, I simply meant adding a second check here that we can obtain and setup the AIO context successfully. > /* Currently Linux does AIO only for files opened with O_DIRECT */ > if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) { > error_setg(errp, "aio=native was specified, but it requires " > "cache.direct=on, which was not specified."); > ret = -EINVAL; > goto fail; > } > > laio_init() is about other types of errors. But anyway, yes, calling > laio_init() already in .bdrv_open() is possible. Returning errors from > .bdrv_open() is nice and easy and we should do it. Ack. > However, we may also need to call laio_init() again when switching to a > different I/O thread after the image is already opened. This is what I > meant when I commented on v1 that you should do this in the > .bdrv_attach_aio_context callback. The problem here is that we can't > return an error there and the guest is already using the image. In this > case, logging an error and falling back to the thread pool seems to be > the best option we have. Is this is a request for new functionality? Just trying to understand, because aiui, block/file-posix.c does not implement the bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio() is called from three places, raw_co_prw, raw_aio_plug and raw_aio_unplug, which calls into laio_init() and laio_attach_aio_context(). I can add the callback you suggest with appropriate error handling (I suppose it would point to laio_attach_aio_context, possibly with some modifications) and remove the call from aio_get_linux_aio()? Just trying to understand the request a bit better, as I don't see where exactly iothreads get switched and how that is implemented currently (and thus where laio_init() would get called again in the current code). Thanks in advance! -Nish
On 20.06.2018 [12:34:52 -0700], Nishanth Aravamudan wrote: > On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote: > > Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben: > > > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > > > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > > > > > > <snip> > > > > > > > > > > > } else if (s->use_linux_aio) { > > > > > > > + int rc; > > > > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > > > > + if (rc != 0) { > > > > > > > + error_report("Unable to use native AIO, falling back to " > > > > > > > + "thread pool."); > > > > > > > > > > > > In general, error_report() should not output a trailing '.'. > > > > > > > > > > Will fix. > > > > > > > > > > > > + s->use_linux_aio = 0; > > > > > > > + return rc; > > > > > > > > > > > > Wait - the message claims we are falling back, but the non-zero return code > > > > > > sounds like we are returning an error instead of falling back. (My > > > > > > preference - if the user requested something and we can't do it, it's better > > > > > > to error than to fall back to something that does not match the user's > > > > > > request). > > > > > > > > > > I think that makes sense, I hadn't tested this specific case (in my > > > > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > > > > called before raw_aio_plug() had been called, but I think returning the > > > > > error code up should be handled correctly. What about the cases where > > > > > there is no error handling (the other two changes in the patch)? > > > > > > > > While looking at doing these changes, I realized that I'm not quite sure > > > > what the right approach is here. My original rationale for returning > > > > non-zero was that AIO was requested but could not be completed. I > > > > haven't fully tracked back the calling paths, but I assumed it would get > > > > retried at the top level, and since we indicated to not use AIO on > > > > subsequent calls, it will succeed and use threads then (note, that I do > > > > now realize this means a mismatch between the qemu command-line and the > > > > in-use AIO model). > > > > > > > > In practice, with my v2 patch, where I do return a non-zero error-code > > > > from this function, qemu does not exit (nor is any logging other than > > > > that I added emitted on the monitor). If I do not fallback, I imagine we > > > > would just continuously see this error message and IO might not actually > > > > every occur? Reworking all of the callpath to fail on non-zero returns > > > > from raw_co_prw() seems like a fair bit of work, but if that is what is > > > > being requested, I can try that (it will just take a while). > > > > Alternatively, I can produce a v3 quickly that does not bubble the > > > > actual errno all the way up (since it does seem like it is ignored > > > > anyways?). > > > > > > Sorry for the noise, but I had one more thought. Would it be appropriate > > > to push the _setup() call up to when we parse the arguments about > > > aio=native? E.g., we already check there if cache=directsync is > > > specified and error out if not. > > > > We already do this: > > Right, I stated above it already is done, I simply meant adding a second > check here that we can obtain and setup the AIO context successfully. > > > /* Currently Linux does AIO only for files opened with O_DIRECT */ > > if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) { > > error_setg(errp, "aio=native was specified, but it requires " > > "cache.direct=on, which was not specified."); > > ret = -EINVAL; > > goto fail; > > } > > > > laio_init() is about other types of errors. But anyway, yes, calling > > laio_init() already in .bdrv_open() is possible. Returning errors from > > .bdrv_open() is nice and easy and we should do it. > > Ack. > > > However, we may also need to call laio_init() again when switching to a > > different I/O thread after the image is already opened. This is what I > > meant when I commented on v1 that you should do this in the > > .bdrv_attach_aio_context callback. The problem here is that we can't > > return an error there and the guest is already using the image. In this > > case, logging an error and falling back to the thread pool seems to be > > the best option we have. > > Is this is a request for new functionality? Just trying to understand, > because aiui, block/file-posix.c does not implement the > bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio() > is called from three places, raw_co_prw, raw_aio_plug and > raw_aio_unplug, which calls into laio_init() and > laio_attach_aio_context(). I can add the callback you suggest with > appropriate error handling (I suppose it would point to > laio_attach_aio_context, possibly with some modifications) and remove > the call from aio_get_linux_aio()? Just trying to understand the request > a bit better, as I don't see where exactly iothreads get switched and > how that is implemented currently (and thus where laio_init() would get > called again in the current code). While I waited for a reply to this, I started coding on what I think was being asked for and have come to the conclusion that there are actually three bugs here :) Test cases (with one disk attached to the VM): 1) Set /proc/sys/fs/max-aio-nr to 0. Specify aio=native and qemu dies with a SIGSEGV. - This case is understood and pushing the laio_init()-return code check to the bdrv_open() path fixes this (and allows for the failure to be communicated to the user). 2) Set /proc/sys/fs/max-aio-nr to 128. Specify aio=native and some number of IOThreads. Over qmp issue a x-blockdev-set-iothread command to move the block device node to one of the IOThreads. qemu eventually dies with a SIGSEGV. - I am fairly sure this is the case you described above, and is fixed by re-implementing the bdrv_{attach,detach}_aio_context callbacks. I have a patch that does this and successfully tested the SEGV is avoided. 3) Set /proc/sys/fs/max-aio-nr to 512 (I think 256 would be sufficient, though). Specify aio=native and some number of IOThreads. Over qmp issue a x-blockdev-set-iothread command to move the block device node to one of the IOThreads. Shutdown the guest normally. qemu dies with a SIGABRT. - This appears to be because there is a mismatch in aio_context_{acquire,release} calls (this is my hypothesis right now). The abort comes from bdrv_flush -> aio_context_release and an EPERM from qemu_mutex_unlock_impl() which I believe is just reflecting an EPERM from pthread_mutex_unlock? My theory is that the main qemu thread acquired the aio mutex but then the IOThread released it? I will try and trace the mutexes tomorrow, but I still don't have a fix for this case. Thanks, Nish
Am 21.06.2018 um 05:26 hat Nishanth Aravamudan geschrieben: > On 20.06.2018 [12:34:52 -0700], Nishanth Aravamudan wrote: > > On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote: > > > Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben: > > > > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > > > > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > > > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > > > > > > > > <snip> > > > > > > > > > > > > > } else if (s->use_linux_aio) { > > > > > > > > + int rc; > > > > > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > > > > > + if (rc != 0) { > > > > > > > > + error_report("Unable to use native AIO, falling back to " > > > > > > > > + "thread pool."); > > > > > > > > > > > > > > In general, error_report() should not output a trailing '.'. > > > > > > > > > > > > Will fix. > > > > > > > > > > > > > > + s->use_linux_aio = 0; > > > > > > > > + return rc; > > > > > > > > > > > > > > Wait - the message claims we are falling back, but the non-zero return code > > > > > > > sounds like we are returning an error instead of falling back. (My > > > > > > > preference - if the user requested something and we can't do it, it's better > > > > > > > to error than to fall back to something that does not match the user's > > > > > > > request). > > > > > > > > > > > > I think that makes sense, I hadn't tested this specific case (in my > > > > > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > > > > > called before raw_aio_plug() had been called, but I think returning the > > > > > > error code up should be handled correctly. What about the cases where > > > > > > there is no error handling (the other two changes in the patch)? > > > > > > > > > > While looking at doing these changes, I realized that I'm not quite sure > > > > > what the right approach is here. My original rationale for returning > > > > > non-zero was that AIO was requested but could not be completed. I > > > > > haven't fully tracked back the calling paths, but I assumed it would get > > > > > retried at the top level, and since we indicated to not use AIO on > > > > > subsequent calls, it will succeed and use threads then (note, that I do > > > > > now realize this means a mismatch between the qemu command-line and the > > > > > in-use AIO model). > > > > > > > > > > In practice, with my v2 patch, where I do return a non-zero error-code > > > > > from this function, qemu does not exit (nor is any logging other than > > > > > that I added emitted on the monitor). If I do not fallback, I imagine we > > > > > would just continuously see this error message and IO might not actually > > > > > every occur? Reworking all of the callpath to fail on non-zero returns > > > > > from raw_co_prw() seems like a fair bit of work, but if that is what is > > > > > being requested, I can try that (it will just take a while). > > > > > Alternatively, I can produce a v3 quickly that does not bubble the > > > > > actual errno all the way up (since it does seem like it is ignored > > > > > anyways?). > > > > > > > > Sorry for the noise, but I had one more thought. Would it be appropriate > > > > to push the _setup() call up to when we parse the arguments about > > > > aio=native? E.g., we already check there if cache=directsync is > > > > specified and error out if not. > > > > > > We already do this: > > > > Right, I stated above it already is done, I simply meant adding a second > > check here that we can obtain and setup the AIO context successfully. Yes, sorry, I misread. > > > /* Currently Linux does AIO only for files opened with O_DIRECT */ > > > if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) { > > > error_setg(errp, "aio=native was specified, but it requires " > > > "cache.direct=on, which was not specified."); > > > ret = -EINVAL; > > > goto fail; > > > } > > > > > > laio_init() is about other types of errors. But anyway, yes, calling > > > laio_init() already in .bdrv_open() is possible. Returning errors from > > > .bdrv_open() is nice and easy and we should do it. > > > > Ack. > > > > > However, we may also need to call laio_init() again when switching to a > > > different I/O thread after the image is already opened. This is what I > > > meant when I commented on v1 that you should do this in the > > > .bdrv_attach_aio_context callback. The problem here is that we can't > > > return an error there and the guest is already using the image. In this > > > case, logging an error and falling back to the thread pool seems to be > > > the best option we have. > > > > Is this is a request for new functionality? Just trying to understand, > > because aiui, block/file-posix.c does not implement the > > bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio() > > is called from three places, raw_co_prw, raw_aio_plug and > > raw_aio_unplug, which calls into laio_init() and > > laio_attach_aio_context(). I can add the callback you suggest with > > appropriate error handling (I suppose it would point to > > laio_attach_aio_context, possibly with some modifications) and remove > > the call from aio_get_linux_aio()? Just trying to understand the request > > a bit better, as I don't see where exactly iothreads get switched and > > how that is implemented currently (and thus where laio_init() would get > > called again in the current code). It's not new functionality, but moving things to a different place. Our goal is that aio_get_linux_aio() never returns NULL in the I/O path so that we don't have to duplicate the error handling everywhere. For each AioContext, aio_get_linux_aio() can return NULL only the first time. Once it successfully returned a LinuxAioState, it will never return NULL again for the same AioContext. You cover the QEMU main loop AioContext when you call aio_get_linux_aio() in raw_open_common(). But whenever the AioContext is changed, we can get NULL again on the next aio_get_linux_aio() call because we don't know whether that AioContext already successfully returned a LinuxAioState before. If you call once it in .bdrv_attach_aio_context() and disable Linux AIO if it fails, you have centralised all the error paths that you currently need in all I/O paths. > While I waited for a reply to this, I started coding on what I think was > being asked for and have come to the conclusion that there are actually > three bugs here :) > > Test cases (with one disk attached to the VM): > > 1) Set /proc/sys/fs/max-aio-nr to 0. Specify aio=native and qemu dies > with a SIGSEGV. > - This case is understood and pushing the laio_init()-return code > check to the bdrv_open() path fixes this (and allows for the > failure to be communicated to the user). Right. > 2) Set /proc/sys/fs/max-aio-nr to 128. Specify aio=native and some > number of IOThreads. Over qmp issue a x-blockdev-set-iothread command to > move the block device node to one of the IOThreads. qemu eventually dies > with a SIGSEGV. > - I am fairly sure this is the case you described above, and is > fixed by re-implementing the bdrv_{attach,detach}_aio_context > callbacks. I have a patch that does this and successfully tested > the SEGV is avoided. > > 3) Set /proc/sys/fs/max-aio-nr to 512 (I think 256 would be sufficient, > though). Specify aio=native and some number of IOThreads. Over qmp issue > a x-blockdev-set-iothread command to move the block device node to one > of the IOThreads. Shutdown the guest normally. qemu dies with a SIGABRT. > - This appears to be because there is a mismatch in > aio_context_{acquire,release} calls (this is my hypothesis right > now). The abort comes from bdrv_flush -> aio_context_release and > an EPERM from qemu_mutex_unlock_impl() which I believe is just > reflecting an EPERM from pthread_mutex_unlock? My theory is that > the main qemu thread acquired the aio mutex but then the IOThread > released it? I will try and trace the mutexes tomorrow, but I > still don't have a fix for this case. x-blockdev-set-iothread is not the proper way to achieve what you want. You can only correctly use it with images that are not attached to any guest device. If you move a block node to a different I/O thread (and therefore AioContext) under the feet of the guest device, crashes aren't completely unexpected. The proper way to test this would be -device virtio-blk,iothread=... In this case, the raw BlockDriverState is first created in the main AioContext, but when the virtio-blk device is initialised, it moves it to the specified I/O thread. Kevin
On 21.06.2018 [15:51:08 +0200], Kevin Wolf wrote: > Am 21.06.2018 um 05:26 hat Nishanth Aravamudan geschrieben: > > On 20.06.2018 [12:34:52 -0700], Nishanth Aravamudan wrote: > > > On 20.06.2018 [11:57:42 +0200], Kevin Wolf wrote: > > > > Am 20.06.2018 um 00:54 hat Nishanth Aravamudan geschrieben: > > > > > On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > > > > > > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > > > > > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > > > > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > > > > > > > > > > > <snip> > > > > > > > > > > > > > > > } else if (s->use_linux_aio) { > > > > > > > > > + int rc; > > > > > > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > > > > > > + if (rc != 0) { > > > > > > > > > + error_report("Unable to use native AIO, falling back to " > > > > > > > > > + "thread pool."); > > > > > > > > > > > > > > > > In general, error_report() should not output a trailing '.'. > > > > > > > > > > > > > > Will fix. > > > > > > > > > > > > > > > > + s->use_linux_aio = 0; > > > > > > > > > + return rc; > > > > > > > > > > > > > > > > Wait - the message claims we are falling back, but the non-zero return code > > > > > > > > sounds like we are returning an error instead of falling back. (My > > > > > > > > preference - if the user requested something and we can't do it, it's better > > > > > > > > to error than to fall back to something that does not match the user's > > > > > > > > request). > > > > > > > > > > > > > > I think that makes sense, I hadn't tested this specific case (in my > > > > > > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > > > > > > called before raw_aio_plug() had been called, but I think returning the > > > > > > > error code up should be handled correctly. What about the cases where > > > > > > > there is no error handling (the other two changes in the patch)? > > > > > > > > > > > > While looking at doing these changes, I realized that I'm not quite sure > > > > > > what the right approach is here. My original rationale for returning > > > > > > non-zero was that AIO was requested but could not be completed. I > > > > > > haven't fully tracked back the calling paths, but I assumed it would get > > > > > > retried at the top level, and since we indicated to not use AIO on > > > > > > subsequent calls, it will succeed and use threads then (note, that I do > > > > > > now realize this means a mismatch between the qemu command-line and the > > > > > > in-use AIO model). > > > > > > > > > > > > In practice, with my v2 patch, where I do return a non-zero error-code > > > > > > from this function, qemu does not exit (nor is any logging other than > > > > > > that I added emitted on the monitor). If I do not fallback, I imagine we > > > > > > would just continuously see this error message and IO might not actually > > > > > > every occur? Reworking all of the callpath to fail on non-zero returns > > > > > > from raw_co_prw() seems like a fair bit of work, but if that is what is > > > > > > being requested, I can try that (it will just take a while). > > > > > > Alternatively, I can produce a v3 quickly that does not bubble the > > > > > > actual errno all the way up (since it does seem like it is ignored > > > > > > anyways?). > > > > > > > > > > Sorry for the noise, but I had one more thought. Would it be appropriate > > > > > to push the _setup() call up to when we parse the arguments about > > > > > aio=native? E.g., we already check there if cache=directsync is > > > > > specified and error out if not. > > > > > > > > We already do this: > > > > > > Right, I stated above it already is done, I simply meant adding a second > > > check here that we can obtain and setup the AIO context successfully. > > Yes, sorry, I misread. > > > > > /* Currently Linux does AIO only for files opened with O_DIRECT */ > > > > if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) { > > > > error_setg(errp, "aio=native was specified, but it requires " > > > > "cache.direct=on, which was not specified."); > > > > ret = -EINVAL; > > > > goto fail; > > > > } > > > > > > > > laio_init() is about other types of errors. But anyway, yes, calling > > > > laio_init() already in .bdrv_open() is possible. Returning errors from > > > > .bdrv_open() is nice and easy and we should do it. > > > > > > Ack. > > > > > > > However, we may also need to call laio_init() again when switching to a > > > > different I/O thread after the image is already opened. This is what I > > > > meant when I commented on v1 that you should do this in the > > > > .bdrv_attach_aio_context callback. The problem here is that we can't > > > > return an error there and the guest is already using the image. In this > > > > case, logging an error and falling back to the thread pool seems to be > > > > the best option we have. > > > > > > Is this is a request for new functionality? Just trying to understand, > > > because aiui, block/file-posix.c does not implement the > > > bdrv_attach_aio_context callback currently. Instead, aio_get_linux_aio() > > > is called from three places, raw_co_prw, raw_aio_plug and > > > raw_aio_unplug, which calls into laio_init() and > > > laio_attach_aio_context(). I can add the callback you suggest with > > > appropriate error handling (I suppose it would point to > > > laio_attach_aio_context, possibly with some modifications) and remove > > > the call from aio_get_linux_aio()? Just trying to understand the request > > > a bit better, as I don't see where exactly iothreads get switched and > > > how that is implemented currently (and thus where laio_init() would get > > > called again in the current code). > > It's not new functionality, but moving things to a different place. > > Our goal is that aio_get_linux_aio() never returns NULL in the I/O path > so that we don't have to duplicate the error handling everywhere. For > each AioContext, aio_get_linux_aio() can return NULL only the first > time. Once it successfully returned a LinuxAioState, it will never > return NULL again for the same AioContext. > > You cover the QEMU main loop AioContext when you call > aio_get_linux_aio() in raw_open_common(). But whenever the AioContext is > changed, we can get NULL again on the next aio_get_linux_aio() call > because we don't know whether that AioContext already successfully > returned a LinuxAioState before. > > If you call once it in .bdrv_attach_aio_context() and disable Linux AIO > if it fails, you have centralised all the error paths that you currently > need in all I/O paths. Thanks for the clarification! > > While I waited for a reply to this, I started coding on what I think was > > being asked for and have come to the conclusion that there are actually > > three bugs here :) > > > > Test cases (with one disk attached to the VM): > > > > 1) Set /proc/sys/fs/max-aio-nr to 0. Specify aio=native and qemu dies > > with a SIGSEGV. > > - This case is understood and pushing the laio_init()-return code > > check to the bdrv_open() path fixes this (and allows for the > > failure to be communicated to the user). > > Right. > > > 2) Set /proc/sys/fs/max-aio-nr to 128. Specify aio=native and some > > number of IOThreads. Over qmp issue a x-blockdev-set-iothread command to > > move the block device node to one of the IOThreads. qemu eventually dies > > with a SIGSEGV. > > - I am fairly sure this is the case you described above, and is > > fixed by re-implementing the bdrv_{attach,detach}_aio_context > > callbacks. I have a patch that does this and successfully tested > > the SEGV is avoided. > > > > 3) Set /proc/sys/fs/max-aio-nr to 512 (I think 256 would be sufficient, > > though). Specify aio=native and some number of IOThreads. Over qmp issue > > a x-blockdev-set-iothread command to move the block device node to one > > of the IOThreads. Shutdown the guest normally. qemu dies with a SIGABRT. > > - This appears to be because there is a mismatch in > > aio_context_{acquire,release} calls (this is my hypothesis right > > now). The abort comes from bdrv_flush -> aio_context_release and > > an EPERM from qemu_mutex_unlock_impl() which I believe is just > > reflecting an EPERM from pthread_mutex_unlock? My theory is that > > the main qemu thread acquired the aio mutex but then the IOThread > > released it? I will try and trace the mutexes tomorrow, but I > > still don't have a fix for this case. > > x-blockdev-set-iothread is not the proper way to achieve what you want. > You can only correctly use it with images that are not attached to any > guest device. If you move a block node to a different I/O thread (and > therefore AioContext) under the feet of the guest device, crashes aren't > completely unexpected. > > The proper way to test this would be -device virtio-blk,iothread=... > In this case, the raw BlockDriverState is first created in the main > AioContext, but when the virtio-blk device is initialised, it moves it > to the specified I/O thread. Thanks again! Turns out 2) is definitely present in master. 3) does not occur. So I will work on making my patch into 2 patches and repost a v3 later today. Thanks, Nish
diff --git a/block/file-posix.c b/block/file-posix.c index 07bb061fe4..2415d09bf1 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1665,6 +1665,14 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, uint64_t offset, type |= QEMU_AIO_MISALIGNED; #ifdef CONFIG_LINUX_AIO } else if (s->use_linux_aio) { + int rc; + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); + if (rc != 0) { + error_report("Unable to use native AIO, falling back to " + "thread pool."); + s->use_linux_aio = 0; + return rc; + } LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); assert(qiov->size == bytes); return laio_co_submit(bs, aio, s->fd, offset, qiov, type); @@ -1695,6 +1703,14 @@ static void raw_aio_plug(BlockDriverState *bs) #ifdef CONFIG_LINUX_AIO BDRVRawState *s = bs->opaque; if (s->use_linux_aio) { + int rc; + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); + if (rc != 0) { + error_report("Unable to use native AIO, falling back to " + "thread pool."); + s->use_linux_aio = 0; + return; + } LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); laio_io_plug(bs, aio); } @@ -1706,6 +1722,14 @@ static void raw_aio_unplug(BlockDriverState *bs) #ifdef CONFIG_LINUX_AIO BDRVRawState *s = bs->opaque; if (s->use_linux_aio) { + int rc; + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); + if (rc != 0) { + error_report("Unable to use native AIO, falling back to " + "thread pool."); + s->use_linux_aio = 0; + return; + } LinuxAioState *aio = aio_get_linux_aio(bdrv_get_aio_context(bs)); laio_io_unplug(bs, aio); } diff --git a/block/linux-aio.c b/block/linux-aio.c index 88b8d55ec7..4d799f85fe 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -470,28 +470,33 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context) qemu_laio_poll_cb); } -LinuxAioState *laio_init(void) +int laio_init(LinuxAioState **linux_aio) { + int rc; LinuxAioState *s; s = g_malloc0(sizeof(*s)); - if (event_notifier_init(&s->e, false) < 0) { + rc = event_notifier_init(&s->e, false); + if (rc < 0) { goto out_free_state; } - if (io_setup(MAX_EVENTS, &s->ctx) != 0) { + rc = io_setup(MAX_EVENTS, &s->ctx); + if (rc != 0) { goto out_close_efd; } ioq_init(&s->io_q); - return s; + *linux_aio = s; + return 0; out_close_efd: event_notifier_cleanup(&s->e); out_free_state: g_free(s); - return NULL; + *linux_aio = NULL; + return rc; } void laio_cleanup(LinuxAioState *s) diff --git a/include/block/aio.h b/include/block/aio.h index ae6f354e6c..8900516ac5 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -381,6 +381,9 @@ GSource *aio_get_g_source(AioContext *ctx); /* Return the ThreadPool bound to this AioContext */ struct ThreadPool *aio_get_thread_pool(AioContext *ctx); +/* Setup the LinuxAioState bound to this AioContext */ +int aio_setup_linux_aio(AioContext *ctx); + /* Return the LinuxAioState bound to this AioContext */ struct LinuxAioState *aio_get_linux_aio(AioContext *ctx); diff --git a/include/block/raw-aio.h b/include/block/raw-aio.h index 0e717fd475..81b90e5fc6 100644 --- a/include/block/raw-aio.h +++ b/include/block/raw-aio.h @@ -43,7 +43,7 @@ /* linux-aio.c - Linux native implementation */ #ifdef CONFIG_LINUX_AIO typedef struct LinuxAioState LinuxAioState; -LinuxAioState *laio_init(void); +int laio_init(LinuxAioState **linux_aio); void laio_cleanup(LinuxAioState *s); int coroutine_fn laio_co_submit(BlockDriverState *bs, LinuxAioState *s, int fd, uint64_t offset, QEMUIOVector *qiov, int type); diff --git a/stubs/linux-aio.c b/stubs/linux-aio.c index ed47bd443c..88ab927e35 100644 --- a/stubs/linux-aio.c +++ b/stubs/linux-aio.c @@ -21,7 +21,7 @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context) abort(); } -LinuxAioState *laio_init(void) +int laio_init(LinuxAioState **linux_aio) { abort(); } diff --git a/util/async.c b/util/async.c index 03f62787f2..b3fb67af3c 100644 --- a/util/async.c +++ b/util/async.c @@ -323,12 +323,21 @@ ThreadPool *aio_get_thread_pool(AioContext *ctx) } #ifdef CONFIG_LINUX_AIO -LinuxAioState *aio_get_linux_aio(AioContext *ctx) +int aio_setup_linux_aio(AioContext *ctx) { + int rc; + rc = 0; if (!ctx->linux_aio) { - ctx->linux_aio = laio_init(); - laio_attach_aio_context(ctx->linux_aio, ctx); + rc = laio_init(&ctx->linux_aio); + if (rc == 0) { + laio_attach_aio_context(ctx->linux_aio, ctx); + } } + return rc; +} + +LinuxAioState *aio_get_linux_aio(AioContext *ctx) +{ return ctx->linux_aio; } #endif
laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called before aio_get_linux_aio() where it is called currently, and which propogates setup errors up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. With respect to the error-handling in the file-posix.c, we properly bubble any errors up in raw_co_prw and in the case s of raw_aio_{,un}plug, the result is the same as if s->use_linux_aio was not set (but there is no bubbling up). In all three cases, if the setup function fails, we fallback to the thread pool and an error message is emitted. It is trivial to make qemu segfault in my testing. Set /proc/sys/fs/aio-max-nr to 0 and start a guest with aio=native,cache=directsync. With this patch, the guest successfully starts (but obviously isn't using native AIO). Setting aio-max-nr back up to a reasonable value, AIO contexts are consumed normally. Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> --- Changes from v1 -> v2: Rather than affect virtio-scsi/blk at all, make all the changes internal to file-posix.c. Thanks to Kevin Wolf for the suggested change. --- block/file-posix.c | 24 ++++++++++++++++++++++++ block/linux-aio.c | 15 ++++++++++----- include/block/aio.h | 3 +++ include/block/raw-aio.h | 2 +- stubs/linux-aio.c | 2 +- util/async.c | 15 ++++++++++++--- 6 files changed, 51 insertions(+), 10 deletions(-)