Message ID | 20230419092233.56338-1-luhongfei@vivo.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | io_uring: Optimization of buffered random write | expand |
On 4/19/23 3:22?AM, luhongfei wrote: > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 4a865f0e85d0..64bb91beb4d6 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -2075,8 +2075,23 @@ static inline void io_queue_sqe(struct io_kiocb *req) > __must_hold(&req->ctx->uring_lock) > { > int ret; > + bool is_write; > > - ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); > + switch (req->opcode) { > + case IORING_OP_WRITEV: > + case IORING_OP_WRITE_FIXED: > + case IORING_OP_WRITE: > + is_write = true; > + break; > + default: > + is_write = false; > + break; > + } > + > + if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) > + ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); > + else > + ret = io_issue_sqe(req, 0); > > /* > * We async punt it if the file wasn't marked NOWAIT, or if the file We really can't just do that, implicitly. What you are doing is making any of write synchronous. What are you writing to in terms of device or file? If file, what file system is being used? Curious if the target supports async buffered writes, guessing it does not which is why you see io-wq activity for all of them. That said, I did toss out a test patch a while back that explicitly sets up the ring such that we'll do blocking IO rather than do a non-blocking attempt and then punt it if that fails. And I do think there's a use case for that, in case you just want to use io_uring for batched syscalls and don't care about if you end up blocking for some IO. Let's do a primer on what happens for io_uring issue: 1) Non-blocking issue is attempted for IO. If successful, we're done for now. 2) Case 1 failed. Now we have two options a) We can poll the file. We arm poll, and we're done for now until that triggers. b) File cannot be polled, we punt to io-wq which then does a blocking attempt. For case 2b, this is the one where we could've just done a blocking attempt initially if the ring was setup with a flag explicitly saying that's what the application wants. Or io_uring_enter() had a flag passed in that explicitly said this is what the applications wants. I suspect we'll want both, to cover both SQPOLL and !SQPOLL. I'd recommend we still retain non-blocking issue for pollable files, as you could very quickly block forever otherwise. Imagine an empty pipe and a read issued to it in the blocking mode. A solution like that would cater to your case too, without potentially breaking a lot of things like your patch could. The key here is the explicit nature of it, we cannot just go and make odd assumptions about a particular opcode type (writes) and ring type (SQPOLL) and say "oh this one is fine for just ignoring blocking off the issue path".
On 4/19/23 7:32?AM, Jens Axboe wrote: > On 4/19/23 3:22?AM, luhongfei wrote: >> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c >> index 4a865f0e85d0..64bb91beb4d6 >> --- a/io_uring/io_uring.c >> +++ b/io_uring/io_uring.c >> @@ -2075,8 +2075,23 @@ static inline void io_queue_sqe(struct io_kiocb *req) >> __must_hold(&req->ctx->uring_lock) >> { >> int ret; >> + bool is_write; >> >> - ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); >> + switch (req->opcode) { >> + case IORING_OP_WRITEV: >> + case IORING_OP_WRITE_FIXED: >> + case IORING_OP_WRITE: >> + is_write = true; >> + break; >> + default: >> + is_write = false; >> + break; >> + } >> + >> + if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) >> + ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); >> + else >> + ret = io_issue_sqe(req, 0); >> >> /* >> * We async punt it if the file wasn't marked NOWAIT, or if the file > > We really can't just do that, implicitly. What you are doing is making > any of write synchronous. What are you writing to in terms of device or > file? If file, what file system is being used? Curious if the target > supports async buffered writes, guessing it does not which is why you > see io-wq activity for all of them. > > That said, I did toss out a test patch a while back that explicitly sets > up the ring such that we'll do blocking IO rather than do a non-blocking > attempt and then punt it if that fails. And I do think there's a use > case for that, in case you just want to use io_uring for batched > syscalls and don't care about if you end up blocking for some IO. > > Let's do a primer on what happens for io_uring issue: > > 1) Non-blocking issue is attempted for IO. If successful, we're done for > now. > > 2) Case 1 failed. Now we have two options > a) We can poll the file. We arm poll, and we're done for now > until that triggers. > b) File cannot be polled, we punt to io-wq which then does a > blocking attempt. > > For case 2b, this is the one where we could've just done a blocking > attempt initially if the ring was setup with a flag explicitly saying > that's what the application wants. Or io_uring_enter() had a flag passed > in that explicitly said this is what the applications wants. I suspect > we'll want both, to cover both SQPOLL and !SQPOLL. > > I'd recommend we still retain non-blocking issue for pollable files, as > you could very quickly block forever otherwise. Imagine an empty pipe > and a read issued to it in the blocking mode. > > A solution like that would cater to your case too, without potentially > breaking a lot of things like your patch could. The key here is the > explicit nature of it, we cannot just go and make odd assumptions about > a particular opcode type (writes) and ring type (SQPOLL) and say "oh > this one is fine for just ignoring blocking off the issue path". Something like this, totally untested. You can either setup the ring with IORING_SETUP_NO_OFFLOAD, or you can pass in IORING_ENTER_NO_OFFLOAD to achieve the same thing but on a per-invocation of io_uring_enter(2) basis. I suspect this would be cleaner with an io_kiocb flag for this, so we can make the retry paths correct as well and avoid passing 'no_offload' too much around. I'll probably clean it up with that and actually try and test it. diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 0716cb17e436..ea903a677ce9 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -173,6 +173,12 @@ enum { */ #define IORING_SETUP_DEFER_TASKRUN (1U << 13) +/* + * Don't attempt non-blocking issue on file types that would otherwise + * punt to io-wq if they cannot be completed non-blocking. + */ +#define IORING_SETUP_NO_OFFLOAD (1U << 14) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, @@ -443,6 +449,7 @@ struct io_cqring_offsets { #define IORING_ENTER_SQ_WAIT (1U << 2) #define IORING_ENTER_EXT_ARG (1U << 3) #define IORING_ENTER_REGISTERED_RING (1U << 4) +#define IORING_ENTER_NO_OFFLOAD (1U << 5) /* * Passed in for io_uring_setup(2). Copied back with updated info on success diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3bca7a79efda..431e41701991 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -147,7 +147,7 @@ static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, static void io_dismantle_req(struct io_kiocb *req); static void io_clean_op(struct io_kiocb *req); -static void io_queue_sqe(struct io_kiocb *req); +static void io_queue_sqe(struct io_kiocb *req, bool no_offload); static void io_move_task_work_from_local(struct io_ring_ctx *ctx); static void __io_submit_flush_completions(struct io_ring_ctx *ctx); static __cold void io_fallback_tw(struct io_uring_task *tctx); @@ -1471,7 +1471,7 @@ void io_req_task_submit(struct io_kiocb *req, struct io_tw_state *ts) else if (req->flags & REQ_F_FORCE_ASYNC) io_queue_iowq(req, ts); else - io_queue_sqe(req); + io_queue_sqe(req, false); } void io_req_task_queue_fail(struct io_kiocb *req, int ret) @@ -1938,7 +1938,8 @@ static bool io_assign_file(struct io_kiocb *req, const struct io_issue_def *def, return !!req->file; } -static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) +static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags, + bool no_offload) { const struct io_issue_def *def = &io_issue_defs[req->opcode]; const struct cred *creds = NULL; @@ -1947,6 +1948,9 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!io_assign_file(req, def, issue_flags))) return -EBADF; + if (no_offload && (!req->file || !file_can_poll(req->file))) + issue_flags &= ~IO_URING_F_NONBLOCK; + if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) creds = override_creds(req->creds); @@ -1980,7 +1984,7 @@ int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts) { io_tw_lock(req->ctx, ts); return io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_MULTISHOT| - IO_URING_F_COMPLETE_DEFER); + IO_URING_F_COMPLETE_DEFER, false); } struct io_wq_work *io_wq_free_work(struct io_wq_work *work) @@ -2029,7 +2033,7 @@ void io_wq_submit_work(struct io_wq_work *work) } do { - ret = io_issue_sqe(req, issue_flags); + ret = io_issue_sqe(req, issue_flags, false); if (ret != -EAGAIN) break; /* @@ -2120,12 +2124,13 @@ static void io_queue_async(struct io_kiocb *req, int ret) io_queue_linked_timeout(linked_timeout); } -static inline void io_queue_sqe(struct io_kiocb *req) +static inline void io_queue_sqe(struct io_kiocb *req, bool no_offload) __must_hold(&req->ctx->uring_lock) { int ret; - ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); + ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER, + no_offload); /* * We async punt it if the file wasn't marked NOWAIT, or if the file @@ -2337,7 +2342,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe, } static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, - const struct io_uring_sqe *sqe) + const struct io_uring_sqe *sqe, bool no_offload) __must_hold(&ctx->uring_lock) { struct io_submit_link *link = &ctx->submit_state.link; @@ -2385,7 +2390,7 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, return 0; } - io_queue_sqe(req); + io_queue_sqe(req, no_offload); return 0; } @@ -2466,7 +2471,7 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe) return false; } -int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) +int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, bool no_offload) __must_hold(&ctx->uring_lock) { unsigned int entries = io_sqring_entries(ctx); @@ -2495,7 +2500,7 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr) * Continue submitting even for sqe failure if the * ring was setup with IORING_SETUP_SUBMIT_ALL */ - if (unlikely(io_submit_sqe(ctx, req, sqe)) && + if (unlikely(io_submit_sqe(ctx, req, sqe, no_offload)) && !(ctx->flags & IORING_SETUP_SUBMIT_ALL)) { left--; break; @@ -3524,7 +3529,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG | - IORING_ENTER_REGISTERED_RING))) + IORING_ENTER_REGISTERED_RING | + IORING_ENTER_NO_OFFLOAD))) return -EINVAL; /* @@ -3575,12 +3581,17 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, ret = to_submit; } else if (to_submit) { + bool no_offload; + ret = io_uring_add_tctx_node(ctx); if (unlikely(ret)) goto out; + no_offload = flags & IORING_ENTER_NO_OFFLOAD || + ctx->flags & IORING_SETUP_NO_OFFLOAD; + mutex_lock(&ctx->uring_lock); - ret = io_submit_sqes(ctx, to_submit); + ret = io_submit_sqes(ctx, to_submit, no_offload); if (ret != to_submit) { mutex_unlock(&ctx->uring_lock); goto out; @@ -3969,7 +3980,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG | IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | - IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN)) + IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | + IORING_SETUP_NO_OFFLOAD)) return -EINVAL; return io_uring_create(entries, &p, params); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 25515d69d205..c5c0db7232c0 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -76,7 +76,7 @@ int io_uring_alloc_task_context(struct task_struct *task, struct io_ring_ctx *ctx); int io_poll_issue(struct io_kiocb *req, struct io_tw_state *ts); -int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr); +int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr, bool no_offload); int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin); void io_free_batch_list(struct io_ring_ctx *ctx, struct io_wq_work_node *node); int io_req_prep_async(struct io_kiocb *req); diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index 9db4bc1f521a..9a9417bf9e3f 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -166,6 +166,7 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd) static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) { + bool no_offload = ctx->flags & IORING_SETUP_NO_OFFLOAD; unsigned int to_submit; int ret = 0; @@ -190,7 +191,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) */ if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) && !(ctx->flags & IORING_SETUP_R_DISABLED)) - ret = io_submit_sqes(ctx, to_submit); + ret = io_submit_sqes(ctx, to_submit, no_offload); mutex_unlock(&ctx->uring_lock); if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
Hi luhongfei, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.3-rc7 next-20230418] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/luhongfei/io_uring-Optimization-of-buffered-random-write/20230419-172539 patch link: https://lore.kernel.org/r/20230419092233.56338-1-luhongfei%40vivo.com patch subject: [PATCH] io_uring: Optimization of buffered random write config: x86_64-kexec (https://download.01.org/0day-ci/archive/20230420/202304200351.LIOui4Xc-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-8) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/620dbcc5ab192992f08035fd9d271ffffb8ff043 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review luhongfei/io_uring-Optimization-of-buffered-random-write/20230419-172539 git checkout 620dbcc5ab192992f08035fd9d271ffffb8ff043 # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 olddefconfig make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202304200351.LIOui4Xc-lkp@intel.com/ All errors (new ones prefixed by >>): io_uring/io_uring.c: In function 'io_queue_sqe': >> io_uring/io_uring.c:2091:30: error: 'struct io_kiocb' has no member named 'rw' 2091 | if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) | ^~ vim +2091 io_uring/io_uring.c 2073 2074 static inline void io_queue_sqe(struct io_kiocb *req) 2075 __must_hold(&req->ctx->uring_lock) 2076 { 2077 int ret; 2078 bool is_write; 2079 2080 switch (req->opcode) { 2081 case IORING_OP_WRITEV: 2082 case IORING_OP_WRITE_FIXED: 2083 case IORING_OP_WRITE: 2084 is_write = true; 2085 break; 2086 default: 2087 is_write = false; 2088 break; 2089 } 2090 > 2091 if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) 2092 ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); 2093 else 2094 ret = io_issue_sqe(req, 0); 2095 2096 /* 2097 * We async punt it if the file wasn't marked NOWAIT, or if the file 2098 * doesn't support non-blocking read/write attempts 2099 */ 2100 if (likely(!ret)) 2101 io_arm_ltimeout(req); 2102 else 2103 io_queue_async(req, ret); 2104 } 2105
Hi luhongfei, kernel test robot noticed the following build errors: [auto build test ERROR on linus/master] [also build test ERROR on v6.3-rc7 next-20230418] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/luhongfei/io_uring-Optimization-of-buffered-random-write/20230419-172539 patch link: https://lore.kernel.org/r/20230419092233.56338-1-luhongfei%40vivo.com patch subject: [PATCH] io_uring: Optimization of buffered random write config: i386-randconfig-a012-20230417 (https://download.01.org/0day-ci/archive/20230420/202304200502.T4Waeqad-lkp@intel.com/config) compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/620dbcc5ab192992f08035fd9d271ffffb8ff043 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review luhongfei/io_uring-Optimization-of-buffered-random-write/20230419-172539 git checkout 620dbcc5ab192992f08035fd9d271ffffb8ff043 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202304200502.T4Waeqad-lkp@intel.com/ All errors (new ones prefixed by >>): >> io_uring/io_uring.c:2091:25: error: no member named 'rw' in 'struct io_kiocb' if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) ~~~ ^ 1 error generated. vim +2091 io_uring/io_uring.c 2073 2074 static inline void io_queue_sqe(struct io_kiocb *req) 2075 __must_hold(&req->ctx->uring_lock) 2076 { 2077 int ret; 2078 bool is_write; 2079 2080 switch (req->opcode) { 2081 case IORING_OP_WRITEV: 2082 case IORING_OP_WRITE_FIXED: 2083 case IORING_OP_WRITE: 2084 is_write = true; 2085 break; 2086 default: 2087 is_write = false; 2088 break; 2089 } 2090 > 2091 if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) 2092 ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); 2093 else 2094 ret = io_issue_sqe(req, 0); 2095 2096 /* 2097 * We async punt it if the file wasn't marked NOWAIT, or if the file 2098 * doesn't support non-blocking read/write attempts 2099 */ 2100 if (likely(!ret)) 2101 io_arm_ltimeout(req); 2102 else 2103 io_queue_async(req, ret); 2104 } 2105
On 4/19/23 3:30?PM, kernel test robot wrote: > Hi luhongfei, > > kernel test robot noticed the following build errors: > > [auto build test ERROR on linus/master] > [also build test ERROR on v6.3-rc7 next-20230418] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/luhongfei/io_uring-Optimization-of-buffered-random-write/20230419-172539 > patch link: https://lore.kernel.org/r/20230419092233.56338-1-luhongfei%40vivo.com > patch subject: [PATCH] io_uring: Optimization of buffered random write > config: i386-randconfig-a012-20230417 (https://download.01.org/0day-ci/archive/20230420/202304200502.T4Waeqad-lkp@intel.com/config) > compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1) > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://github.com/intel-lab-lkp/linux/commit/620dbcc5ab192992f08035fd9d271ffffb8ff043 > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review luhongfei/io_uring-Optimization-of-buffered-random-write/20230419-172539 > git checkout 620dbcc5ab192992f08035fd9d271ffffb8ff043 > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 olddefconfig > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash > > If you fix the issue, kindly add following tag where applicable > | Reported-by: kernel test robot <lkp@intel.com> > | Link: https://lore.kernel.org/oe-kbuild-all/202304200502.T4Waeqad-lkp@intel.com/ > > All errors (new ones prefixed by >>): > >>> io_uring/io_uring.c:2091:25: error: no member named 'rw' in 'struct io_kiocb' > if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) > ~~~ ^ > 1 error generated. The patch just can't work. Looks like it was forward ported on an older kernel, but not even compiled on a recent kernel. There's no req->rw.kiocb, hasn't been the case since 5.19. And you also can't do layering violations like this, req->rw is rw.c private and cannot even be used in io_uring.c.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4a865f0e85d0..64bb91beb4d6 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2075,8 +2075,23 @@ static inline void io_queue_sqe(struct io_kiocb *req) __must_hold(&req->ctx->uring_lock) { int ret; + bool is_write; - ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); + switch (req->opcode) { + case IORING_OP_WRITEV: + case IORING_OP_WRITE_FIXED: + case IORING_OP_WRITE: + is_write = true; + break; + default: + is_write = false; + break; + } + + if (!is_write || (req->rw.kiocb.ki_flags & IOCB_DIRECT)) + ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER); + else + ret = io_issue_sqe(req, 0); /* * We async punt it if the file wasn't marked NOWAIT, or if the file
The buffered random write performance of io_uring is poor due to the following reason: By default, when performing buffered random writes, io_sq_thread will call io_issue_sqe writes req, but due to the setting of IO_URING_F_NONBLOCK, req is executed asynchronously in iou-wrk, where io_wq_submit_work calls io_issue_sqe completes the write req, with issue_flag as IO_URING_F_UNLOCKED | IO_URING_F_IOWQ, which will reduce performance. This patch will determine whether this req is a buffered random write, and if so, io_sq_thread directly calls io_issue_sqe(req, 0) completes req instead of completing it asynchronously in iou wrk. Performance results: For fio the following results have been obtained with a queue depth of 8 and 4k block size: random writes: without patch with patch libaio psync iops: 287k 560k 248K 324K bw: 1123MB/s 2188MB/s 970MB/s 1267MB/s clat: 52760ns 69918ns 28405ns 2109ns Signed-off-by: luhongfei <luhongfei@vivo.com> --- io_uring/io_uring.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) mode change 100644 => 100755 io_uring/io_uring.c