Message ID | 20230928022228.15770-4-xiaobing.li@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/3] SCHEDULER: Add an interface for counting real utilization. | expand |
On Thu, Sep 28, 2023 at 10:22:28AM +0800, Xiaobing Li wrote: > Since the sq thread has a while(1) structure, during this process, there > may be a lot of time that is not processing IO but does not exceed the > timeout period, therefore, the sqpoll thread will keep running and will > keep occupying the CPU. Obviously, the CPU is wasted at this time;Our > goal is to count the part of the time that the sqpoll thread actually > processes IO, so as to reflect the part of the CPU it uses to process > IO, which can be used to help improve the actual utilization of the CPU > in the future. > > Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com> > --- > io_uring/sqpoll.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c > index bd6c2c7959a5..2c5fc4d95fa8 100644 > --- a/io_uring/sqpoll.c > +++ b/io_uring/sqpoll.c > @@ -11,6 +11,7 @@ > #include <linux/audit.h> > #include <linux/security.h> > #include <linux/io_uring.h> > +#include <linux/time.h> > > #include <uapi/linux/io_uring.h> > > @@ -235,6 +236,10 @@ static int io_sq_thread(void *data) > set_cpus_allowed_ptr(current, cpu_online_mask); > > mutex_lock(&sqd->lock); > + bool first = true; > + struct timespec64 ts_start, ts_end; > + struct timespec64 ts_delta; > + struct sched_entity *se = &sqd->thread->se; > while (1) { > bool cap_entries, sqt_spin = false; > > @@ -243,7 +248,16 @@ static int io_sq_thread(void *data) > break; > timeout = jiffies + sqd->sq_thread_idle; > } > - > + ktime_get_boottime_ts64(&ts_start); > + ts_delta = timespec64_sub(ts_start, ts_end); > + unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + > + se->sq_avg.last_update_time; > + > + if (first) { > + now = 0; > + first = false; > + } > + __update_sq_avg_block(now, se); > cap_entries = !list_is_singular(&sqd->ctx_list); > list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { > int ret = __io_sq_thread(ctx, cap_entries); > @@ -251,6 +265,16 @@ static int io_sq_thread(void *data) > if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) > sqt_spin = true; > } > + > + ktime_get_boottime_ts64(&ts_end); > + ts_delta = timespec64_sub(ts_end, ts_start); > + now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + > + se->sq_avg.last_update_time; > + > + if (sqt_spin) > + __update_sq_avg(now, se); > + else > + __update_sq_avg_block(now, se); > if (io_run_task_work()) > sqt_spin = true; > All of this is quite insane, but the above is actually broken. You're using wall-time to measure runtime of a preemptible thread. On top of that, for extra insanity, you're using the frigging insane timespec interface, because clearly the _ns() interfaces are too complicated or something? And that whole first thing is daft too, pull now out of the loop and set it before, then all that goes away. Now, I see what you're trying to do, but who actually uses this data? Finally, please don't scream in the subject :/
On 9/28/23 2:01 AM, Peter Zijlstra wrote: > All of this is quite insane, but the above is actually broken. You're > using wall-time to measure runtime of a preemptible thread. Would have to agree with that... wall-time specifics aside, this whole thing seems to attempt to solve an issue quite the wrong way around. > Now, I see what you're trying to do, but who actually uses this data? This is the important question - and if you need this data, then why not just account it in sqpoll itself and have some way to export it? Let's please not implicate the core kernel bits for that.
On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote:
> Now, I see what you're trying to do, but who actually uses this data?
I ... don't. There seems to be the notion that since we're polling, that
shouldn't count against the runtime of the thread. But the thread has
chosen to poll! It is doing something! For one thing, it's preventing
the CPU from entering an idle state. It seems absolutely fair to
accuont this poll time to the runtime of the thread. Clearly i'm
missing something.
On 9/28/23 2:37 AM, Matthew Wilcox wrote: > On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote: >> Now, I see what you're trying to do, but who actually uses this data? > > I ... don't. There seems to be the notion that since we're polling, that > shouldn't count against the runtime of the thread. But the thread has > chosen to poll! It is doing something! For one thing, it's preventing > the CPU from entering an idle state. It seems absolutely fair to > accuont this poll time to the runtime of the thread. Clearly i'm > missing something. For sure, it should be accounted as CPU time, as it is exactly that. You could argue that if we needed to preempt this task for something else we would do that (and the code does check that on every loop), but it's still using CPU. I can see maybe wanting to know how much of the total time the thread spent doing ACTUAL work rather than just polling for new work, but that's not really something the scheduler should be involved in and should be purely an io_uring sqpoll stat of some sort if that is truly interesting for an application.
Hi Xiaobing, kernel test robot noticed the following build errors: [auto build test ERROR on tip/sched/core] [also build test ERROR on linus/master v6.6-rc3 next-20230928] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219 base: tip/sched/core patch link: https://lore.kernel.org/r/20230928022228.15770-4-xiaobing.li%40samsung.com patch subject: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. config: arm-mmp2_defconfig (https://download.01.org/0day-ci/archive/20230929/202309290242.FeOyvZYI-lkp@intel.com/config) compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230929/202309290242.FeOyvZYI-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202309290242.FeOyvZYI-lkp@intel.com/ All errors (new ones prefixed by >>): io_uring/sqpoll.c:254:7: error: no member named 'sq_avg' in 'struct sched_entity' se->sq_avg.last_update_time; ~~ ^ >> io_uring/sqpoll.c:260:3: error: call to undeclared function '__update_sq_avg_block'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] __update_sq_avg_block(now, se); ^ io_uring/sqpoll.c:272:7: error: no member named 'sq_avg' in 'struct sched_entity' se->sq_avg.last_update_time; ~~ ^ >> io_uring/sqpoll.c:275:4: error: call to undeclared function '__update_sq_avg'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] __update_sq_avg(now, se); ^ 4 errors generated. vim +/__update_sq_avg_block +260 io_uring/sqpoll.c 221 222 static int io_sq_thread(void *data) 223 { 224 struct io_sq_data *sqd = data; 225 struct io_ring_ctx *ctx; 226 unsigned long timeout = 0; 227 char buf[TASK_COMM_LEN]; 228 DEFINE_WAIT(wait); 229 230 snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); 231 set_task_comm(current, buf); 232 233 if (sqd->sq_cpu != -1) 234 set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); 235 else 236 set_cpus_allowed_ptr(current, cpu_online_mask); 237 238 mutex_lock(&sqd->lock); 239 bool first = true; 240 struct timespec64 ts_start, ts_end; 241 struct timespec64 ts_delta; 242 struct sched_entity *se = &sqd->thread->se; 243 while (1) { 244 bool cap_entries, sqt_spin = false; 245 246 if (io_sqd_events_pending(sqd) || signal_pending(current)) { 247 if (io_sqd_handle_event(sqd)) 248 break; 249 timeout = jiffies + sqd->sq_thread_idle; 250 } 251 ktime_get_boottime_ts64(&ts_start); 252 ts_delta = timespec64_sub(ts_start, ts_end); 253 unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + 254 se->sq_avg.last_update_time; 255 256 if (first) { 257 now = 0; 258 first = false; 259 } > 260 __update_sq_avg_block(now, se); 261 cap_entries = !list_is_singular(&sqd->ctx_list); 262 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { 263 int ret = __io_sq_thread(ctx, cap_entries); 264 265 if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) 266 sqt_spin = true; 267 } 268 269 ktime_get_boottime_ts64(&ts_end); 270 ts_delta = timespec64_sub(ts_end, ts_start); 271 now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + 272 se->sq_avg.last_update_time; 273 274 if (sqt_spin) > 275 __update_sq_avg(now, se); 276 else 277 __update_sq_avg_block(now, se); 278 if (io_run_task_work()) 279 sqt_spin = true; 280 281 if (sqt_spin || !time_after(jiffies, timeout)) { 282 if (sqt_spin) 283 timeout = jiffies + sqd->sq_thread_idle; 284 if (unlikely(need_resched())) { 285 mutex_unlock(&sqd->lock); 286 cond_resched(); 287 mutex_lock(&sqd->lock); 288 } 289 continue; 290 } 291 292 prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE); 293 if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) { 294 bool needs_sched = true; 295 296 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { 297 atomic_or(IORING_SQ_NEED_WAKEUP, 298 &ctx->rings->sq_flags); 299 if ((ctx->flags & IORING_SETUP_IOPOLL) && 300 !wq_list_empty(&ctx->iopoll_list)) { 301 needs_sched = false; 302 break; 303 } 304 305 /* 306 * Ensure the store of the wakeup flag is not 307 * reordered with the load of the SQ tail 308 */ 309 smp_mb__after_atomic(); 310 311 if (io_sqring_entries(ctx)) { 312 needs_sched = false; 313 break; 314 } 315 } 316 317 if (needs_sched) { 318 mutex_unlock(&sqd->lock); 319 schedule(); 320 mutex_lock(&sqd->lock); 321 } 322 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) 323 atomic_andnot(IORING_SQ_NEED_WAKEUP, 324 &ctx->rings->sq_flags); 325 } 326 327 finish_wait(&sqd->wait, &wait); 328 timeout = jiffies + sqd->sq_thread_idle; 329 } 330 331 io_uring_cancel_generic(true, sqd); 332 sqd->thread = NULL; 333 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) 334 atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags); 335 io_run_task_work(); 336 mutex_unlock(&sqd->lock); 337 338 complete(&sqd->exited); 339 do_exit(0); 340 } 341
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index bd6c2c7959a5..2c5fc4d95fa8 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -11,6 +11,7 @@ #include <linux/audit.h> #include <linux/security.h> #include <linux/io_uring.h> +#include <linux/time.h> #include <uapi/linux/io_uring.h> @@ -235,6 +236,10 @@ static int io_sq_thread(void *data) set_cpus_allowed_ptr(current, cpu_online_mask); mutex_lock(&sqd->lock); + bool first = true; + struct timespec64 ts_start, ts_end; + struct timespec64 ts_delta; + struct sched_entity *se = &sqd->thread->se; while (1) { bool cap_entries, sqt_spin = false; @@ -243,7 +248,16 @@ static int io_sq_thread(void *data) break; timeout = jiffies + sqd->sq_thread_idle; } - + ktime_get_boottime_ts64(&ts_start); + ts_delta = timespec64_sub(ts_start, ts_end); + unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + + se->sq_avg.last_update_time; + + if (first) { + now = 0; + first = false; + } + __update_sq_avg_block(now, se); cap_entries = !list_is_singular(&sqd->ctx_list); list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { int ret = __io_sq_thread(ctx, cap_entries); @@ -251,6 +265,16 @@ static int io_sq_thread(void *data) if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) sqt_spin = true; } + + ktime_get_boottime_ts64(&ts_end); + ts_delta = timespec64_sub(ts_end, ts_start); + now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + + se->sq_avg.last_update_time; + + if (sqt_spin) + __update_sq_avg(now, se); + else + __update_sq_avg_block(now, se); if (io_run_task_work()) sqt_spin = true;
Since the sq thread has a while(1) structure, during this process, there may be a lot of time that is not processing IO but does not exceed the timeout period, therefore, the sqpoll thread will keep running and will keep occupying the CPU. Obviously, the CPU is wasted at this time;Our goal is to count the part of the time that the sqpoll thread actually processes IO, so as to reflect the part of the CPU it uses to process IO, which can be used to help improve the actual utilization of the CPU in the future. Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com> --- io_uring/sqpoll.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-)