Message ID | 20230719195417.1704513-5-axboe@kernel.dk (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | Improve async iomap DIO performance | expand |
> /* can use bio alloc cache */ > #define IOCB_ALLOC_CACHE (1 << 21) > +/* > + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the > + * iocb completion can be passed back to the owner for execution from a safe > + * context rather than needing to be punted through a workqueue. If this > + * flag is set, the completion handling may set iocb->dio_complete to a > + * handler, which the issuer will then call from task context to complete > + * the processing of the iocb. iocb->private should then also be set to > + * the argument being passed to this handler. Can you add an explanation when it is safe/destirable to do the deferred completion? As of the last patch we seem to avoid anything that does I/O or transaction commits, but we'd still allow blocking operations like mutexes used in the zonefs completion handler. We need to catch this so future usuers know what to do. Similarly on the iomap side I think we need clear documentation for what context ->end_io can be called in now.
On 7/19/23 11:01?PM, Christoph Hellwig wrote: >> /* can use bio alloc cache */ >> #define IOCB_ALLOC_CACHE (1 << 21) >> +/* >> + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the >> + * iocb completion can be passed back to the owner for execution from a safe >> + * context rather than needing to be punted through a workqueue. If this >> + * flag is set, the completion handling may set iocb->dio_complete to a >> + * handler, which the issuer will then call from task context to complete >> + * the processing of the iocb. iocb->private should then also be set to >> + * the argument being passed to this handler. > > Can you add an explanation when it is safe/destirable to do the deferred > completion? As of the last patch we seem to avoid anything that does > I/O or transaction commits, but we'd still allow blocking operations > like mutexes used in the zonefs completion handler. We need to catch > this so future usuers know what to do. Sure can do - generally it's exactly as you write, anything that does extra IO should still be done in a workqueue to avoid stalling this particular IO completion on further IO. > Similarly on the iomap side I think we need clear documentation for > what context ->end_io can be called in now. Honestly that was needed before as well, not really related to this change (or the next two patches) as they don't change the context of how it is called.
diff --git a/include/linux/fs.h b/include/linux/fs.h index 6867512907d6..115382f66d79 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,6 +338,16 @@ enum rw_hint { #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +/* + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the + * iocb completion can be passed back to the owner for execution from a safe + * context rather than needing to be punted through a workqueue. If this + * flag is set, the completion handling may set iocb->dio_complete to a + * handler, which the issuer will then call from task context to complete + * the processing of the iocb. iocb->private should then also be set to + * the argument being passed to this handler. + */ +#define IOCB_DIO_DEFER (1 << 22) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ @@ -351,7 +361,8 @@ enum rw_hint { { IOCB_WRITE, "WRITE" }, \ { IOCB_WAITQ, "WAITQ" }, \ { IOCB_NOIO, "NOIO" }, \ - { IOCB_ALLOC_CACHE, "ALLOC_CACHE" } + { IOCB_ALLOC_CACHE, "ALLOC_CACHE" }, \ + { IOCB_DIO_DEFER, "DIO_DEFER" } struct kiocb { struct file *ki_filp; @@ -360,7 +371,22 @@ struct kiocb { void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ - struct wait_page_queue *ki_waitq; /* for async buffered IO */ + union { + /* + * Only used for async buffered reads, where it denotes the + * page waitqueue associated with completing the read. Valid + * IFF IOCB_WAITQ is set. + */ + struct wait_page_queue *ki_waitq; + /* + * Can be used for O_DIRECT IO, where the completion handling + * is punted back to the issuer of the IO. May only be set + * if IOCB_DIO_DEFER is set by the issuer, and the issuer must + * then check for presence of this handler when ki_complete is + * invoked. + */ + ssize_t (*dio_complete)(void *data); + }; }; static inline bool is_sync_kiocb(struct kiocb *kiocb)
Async dio completions generally happen from hard/soft IRQ context, which means that users like iomap may need to defer some of the completion handling to a workqueue. This is less efficient than having the original issuer handle it, like we do for sync IO, and it adds latency to the completions. Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely punt these completions to a safe context. If the dio handler is aware of this flag, assign a callback handler in kiocb->dio_complete and associated data io kiocb->private. The issuer will then call this handler with that data from task context. No functional changes in this patch. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- include/linux/fs.h | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-)