Message ID | 20230403183004.347205-1-stefanha@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | block: remove aio_disable_external() API | expand |
On 4/3/23 20:29, Stefan Hajnoczi wrote: > The aio_disable_external() API temporarily suspends file descriptor monitoring > in the event loop. The block layer uses this to prevent new I/O requests being > submitted from the guest and elsewhere between bdrv_drained_begin() and > bdrv_drained_end(). > > While the block layer still needs to prevent new I/O requests in drained > sections, the aio_disable_external() API can be replaced with > .drained_begin/end/poll() callbacks that have been added to BdrvChildClass and > BlockDevOps. > > This newer .bdrained_begin/end/poll() approach is attractive because it works > without specifying a specific AioContext. The block layer is moving towards > multi-queue and that means multiple AioContexts may be processing I/O > simultaneously. > > The aio_disable_external() was always somewhat hacky. It suspends all file > descriptors that were registered with is_external=true, even if they have > nothing to do with the BlockDriverState graph nodes that are being drained. > It's better to solve a block layer problem in the block layer than to have an > odd event loop API solution. > > That covers the motivation for this change, now on to the specifics of this > series: > > While it would be nice if a single conceptual approach could be applied to all > is_external=true file descriptors, I ended up looking at callers on a > case-by-case basis. There are two general ways I migrated code away from > is_external=true: > > 1. Block exports are typically best off unregistering fds in .drained_begin() > and registering them again in .drained_end(). The .drained_poll() function > waits for in-flight requests to finish using a reference counter. > > 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little > simpler. They can rely on BlockBackend's request queuing during drain > feature. Guest I/O request coroutines are suspended in a drained section and > resume upon the end of the drained section. Sorry, I disagree with this. Request queuing was shown to cause deadlocks; Hanna's latest patch is piling another hack upon it, instead in my opinion we should go in the direction of relying _less_ (or not at all) on request queuing. I am strongly convinced that request queuing must apply only after bdrv_drained_begin has returned, which would also fix the IDE TRIM bug reported by Fiona Ebner. The possible livelock scenario is generally not a problem because 1) outside an iothread you have anyway the BQL that prevents a vCPU from issuing more I/O operations during bdrv_drained_begin 2) in iothreads you have aio_disable_external() instead of .drained_begin(). It is also less tidy to start a request during the drained_begin phase, because a request that has been submitted has to be completed (cancel doesn't really work). So in an ideal world, request queuing would not only apply only after bdrv_drained_begin has returned, it would log a warning and .drained_begin() should set up things so that there are no such warnings. Thanks, Paolo
On Tue, Apr 04, 2023 at 03:43:20PM +0200, Paolo Bonzini wrote: > On 4/3/23 20:29, Stefan Hajnoczi wrote: > > The aio_disable_external() API temporarily suspends file descriptor monitoring > > in the event loop. The block layer uses this to prevent new I/O requests being > > submitted from the guest and elsewhere between bdrv_drained_begin() and > > bdrv_drained_end(). > > > > While the block layer still needs to prevent new I/O requests in drained > > sections, the aio_disable_external() API can be replaced with > > .drained_begin/end/poll() callbacks that have been added to BdrvChildClass and > > BlockDevOps. > > > > This newer .bdrained_begin/end/poll() approach is attractive because it works > > without specifying a specific AioContext. The block layer is moving towards > > multi-queue and that means multiple AioContexts may be processing I/O > > simultaneously. > > > > The aio_disable_external() was always somewhat hacky. It suspends all file > > descriptors that were registered with is_external=true, even if they have > > nothing to do with the BlockDriverState graph nodes that are being drained. > > It's better to solve a block layer problem in the block layer than to have an > > odd event loop API solution. > > > > That covers the motivation for this change, now on to the specifics of this > > series: > > > > While it would be nice if a single conceptual approach could be applied to all > > is_external=true file descriptors, I ended up looking at callers on a > > case-by-case basis. There are two general ways I migrated code away from > > is_external=true: > > > > 1. Block exports are typically best off unregistering fds in .drained_begin() > > and registering them again in .drained_end(). The .drained_poll() function > > waits for in-flight requests to finish using a reference counter. > > > > 2. Emulated storage controllers like virtio-blk and virtio-scsi are a little > > simpler. They can rely on BlockBackend's request queuing during drain > > feature. Guest I/O request coroutines are suspended in a drained section and > > resume upon the end of the drained section. > > Sorry, I disagree with this. > > Request queuing was shown to cause deadlocks; Hanna's latest patch is piling > another hack upon it, instead in my opinion we should go in the direction of > relying _less_ (or not at all) on request queuing. > > I am strongly convinced that request queuing must apply only after > bdrv_drained_begin has returned, which would also fix the IDE TRIM bug > reported by Fiona Ebner. The possible livelock scenario is generally not a > problem because 1) outside an iothread you have anyway the BQL that prevents > a vCPU from issuing more I/O operations during bdrv_drained_begin 2) in > iothreads you have aio_disable_external() instead of .drained_begin(). > > It is also less tidy to start a request during the drained_begin phase, > because a request that has been submitted has to be completed (cancel > doesn't really work). > > So in an ideal world, request queuing would not only apply only after > bdrv_drained_begin has returned, it would log a warning and .drained_begin() > should set up things so that there are no such warnings. That's fine, I will give .drained_begin/end/poll() a try with virtio-blk and virtio-scsi in the next revision. Stefan