diff mbox series

[v5] io_uring: Statistics of the true utilization of sq threads.

Message ID 20231218085152.14720-1-xiaobing.li@samsung.com (mailing list archive)
State New
Headers show
Series [v5] io_uring: Statistics of the true utilization of sq threads. | expand

Commit Message

Xiaobing Li Dec. 18, 2023, 8:51 a.m. UTC
The running time of the sq thread and the actual IO processing time are
counted, and the proportion of time actually used to process IO is
output as a percentage.

Variable description:
"work_time" in the code represents the sum of the jiffies of the sq
thread actually processing IO, that is, how many milliseconds it
actually takes to process IO. "total_time" represents the total time
that the sq thread has elapsed from the beginning of the loop to the
current time point, that is, how many milliseconds it has spent in
total.
The output "SqBusy" represents the percentage of time utilization that
the sq thread actually uses to process IO.

The test results are as follows:
Every 0.5s: cat /proc/23112/fdinfo/6 | grep Sq
SqMask: 0x3
SqHead: 1168417
SqTail: 1168418
CachedSqHead:   1168418
SqThread:       23112
SqThreadCpu:    55
SqBusy: 97%

changes:
v5:
 - list the changes in each iteration.
 
v4:
 - Resubmit the patch based on removing sq->lock
 - https://lore.kernel.org/io-uring/20231213032513.12591-1-xiaobing.li@samsung.com/T/#u
 
v3:
 - output actual working time as a percentage of total time
 - detailed description of the meaning of each variable
 - added test results
 - https://lore.kernel.org/io-uring/50ec567f-6b79-42ea-bf2c-2c9b2ecb427d@suswa.mountain/T/#t
 
v2:
 - output the total statistical time and work time to fdinfo
 - https://lore.kernel.org/io-uring/9e2b679c-fc1e-3d83-2303-e053f330a21a@gmail.com/T/#t

v1:
 - initial version
 - Statistics of total time and work time
 - https://lore.kernel.org/io-uring/2a1bdb5a-1216-45b0-a78d-5542b36ccd17@kernel.dk/T/#t

Signed-off-by: Xiaobing Li <xiaobing.li@samsung.com>
---
 io_uring/fdinfo.c | 4 ++++
 io_uring/sqpoll.c | 8 ++++++++
 io_uring/sqpoll.h | 2 ++
 3 files changed, 14 insertions(+)

Comments

Jens Axboe Dec. 18, 2023, 3:53 p.m. UTC | #1
On 12/18/23 1:51 AM, Xiaobing Li wrote:
> The running time of the sq thread and the actual IO processing time are
> counted, and the proportion of time actually used to process IO is
> output as a percentage.
> 
> Variable description:
> "work_time" in the code represents the sum of the jiffies of the sq
> thread actually processing IO, that is, how many milliseconds it
> actually takes to process IO. "total_time" represents the total time
> that the sq thread has elapsed from the beginning of the loop to the
> current time point, that is, how many milliseconds it has spent in
> total.
> The output "SqBusy" represents the percentage of time utilization that
> the sq thread actually uses to process IO.> 
> The test results are as follows:
> Every 0.5s: cat /proc/23112/fdinfo/6 | grep Sq
> SqMask: 0x3
> SqHead: 1168417
> SqTail: 1168418
> CachedSqHead:   1168418
> SqThread:       23112
> SqThreadCpu:    55
> SqBusy: 97%

I think I'm convinced that the effectiveness of the chosen SQPOLL
settings being exposed is useful, I'm just not sure fdinfo is the right
place to do it. Is it going to be a problem that these are just
perpetual stats, with no way to reset them? This means there's no way to
monitor it for a period of time and get effectiveness for something
specific, it'll always just count from when the ring was created.

We could of course have the act of reading the stat also reset it, but
maybe that'd be a bit odd?

Alternatively, it could be exported differently, eg as a register opcode
perhaps.

Open to suggestions...
Xiaobing Li Dec. 22, 2023, 8:35 a.m. UTC | #2
On 12/18/23 15:53, Jens Axboe wrote:
> I think I'm convinced that the effectiveness of the chosen SQPOLL
> settings being exposed is useful, I'm just not sure fdinfo is the right
> place to do it. Is it going to be a problem that these are just
> perpetual stats, with no way to reset them? This means there's no way to
> monitor it for a period of time and get effectiveness for something
> specific, it'll always just count from when the ring was created.
> 
> We could of course have the act of reading the stat also reset it, but
> maybe that'd be a bit odd?
> 
> Alternatively, it could be exported differently, eg as a register opcode
> perhaps.
> 
> Open to suggestions...

I thought carefully about your proposed reset stat, and I think it can be 
achieved by outputting "work_time" and "total_time".
eg:
Output at time t1:
SqMask: 0x3
SqHead: 1168417
SqTail: 1168418
SqWorkTime: t1_work
SqTotalTime: t1_total

Output at time t2:
SqMask: 0x3
SqHead: 1168417
SqTail: 1168418
SqWorkTime: t2_work
SqTotalTime: t2_total

Then we can manually calculate the utilization rate from t1 to t2:
(t2_work - t1_work) / (t2_total - t1_total)

Not sure what you think, but if you think it doesn't work, I'll look into 
other good ways to add the ability to reset.

In addition, on register opcode - generally it is used for resource like
buffers, handles etc.. I am not sure how that can help here. If you have
something in mind, could you please elaborate in more detail?
Jens Axboe Dec. 22, 2023, 2:31 p.m. UTC | #3
On 12/22/23 1:35 AM, Xiaobing Li wrote:
> On 12/18/23 15:53, Jens Axboe wrote:
>> I think I'm convinced that the effectiveness of the chosen SQPOLL
>> settings being exposed is useful, I'm just not sure fdinfo is the right
>> place to do it. Is it going to be a problem that these are just
>> perpetual stats, with no way to reset them? This means there's no way to
>> monitor it for a period of time and get effectiveness for something
>> specific, it'll always just count from when the ring was created.
>>
>> We could of course have the act of reading the stat also reset it, but
>> maybe that'd be a bit odd?
>>
>> Alternatively, it could be exported differently, eg as a register opcode
>> perhaps.
>>
>> Open to suggestions...
> 
> I thought carefully about your proposed reset stat, and I think it can be 
> achieved by outputting "work_time" and "total_time".
> eg:
> Output at time t1:
> SqMask: 0x3
> SqHead: 1168417
> SqTail: 1168418
> SqWorkTime: t1_work
> SqTotalTime: t1_total
> 
> Output at time t2:
> SqMask: 0x3
> SqHead: 1168417
> SqTail: 1168418
> SqWorkTime: t2_work
> SqTotalTime: t2_total
> 
> Then we can manually calculate the utilization rate from t1 to t2:
> (t2_work - t1_work) / (t2_total - t1_total)
> 
> Not sure what you think, but if you think it doesn't work, I'll look into 
> other good ways to add the ability to reset.

Yep that would work, just leave the stats calculation to the tool
querying it. Which is really how it should be.

> In addition, on register opcode - generally it is used for resource like
> buffers, handles etc.. I am not sure how that can help here. If you have
> something in mind, could you please elaborate in more detail?

It's also a bit of a dumping ground for any kind of out-of-band
mechanism, so it would work fine for something like this too. But since
we already have fdinfo and with your idea of just logging work and total
time, then we should probably just stick with that.
Xiaobing Li Dec. 25, 2023, 2:36 a.m. UTC | #4
On 12/22/23 07:31 AM, Jens Axboe wrote:
> Yep that would work, just leave the stats calculation to the tool
> querying it. Which is really how it should be.
>
>> In addition, on register opcode - generally it is used for resource like
>> buffers, handles etc.. I am not sure how that can help here. If you have
>> something in mind, could you please elaborate in more detail?
>
> It's also a bit of a dumping ground for any kind of out-of-band
> mechanism, so it would work fine for something like this too. But since
> we already have fdinfo and with your idea of just logging work and total
> time, then we should probably just stick with that.

Thanks for explained. I'll send out a v6.
diff mbox series

Patch

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 976e9500f651..b0f9d296c5aa 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -64,6 +64,7 @@  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 	unsigned int sq_shift = 0;
 	unsigned int sq_entries, cq_entries;
 	int sq_pid = -1, sq_cpu = -1;
+	int sq_busy = 0;
 	bool has_lock;
 	unsigned int i;
 
@@ -147,10 +148,13 @@  __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 
 		sq_pid = sq->task_pid;
 		sq_cpu = sq->sq_cpu;
+		if (sq->total_time != 0)
+			sq_busy = (int)(sq->work_time * 100 / sq->total_time);
 	}
 
 	seq_printf(m, "SqThread:\t%d\n", sq_pid);
 	seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu);
+	seq_printf(m, "SqBusy:\t%d%%\n", sq_busy);
 	seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);
 	for (i = 0; has_lock && i < ctx->nr_user_files; i++) {
 		struct file *f = io_file_from_index(&ctx->file_table, i);
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 65b5dbe3c850..9b74e344c52a 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -225,6 +225,7 @@  static int io_sq_thread(void *data)
 	struct io_ring_ctx *ctx;
 	unsigned long timeout = 0;
 	char buf[TASK_COMM_LEN];
+	unsigned long sq_start, sq_work_begin, sq_work_end;
 	DEFINE_WAIT(wait);
 
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
@@ -241,6 +242,7 @@  static int io_sq_thread(void *data)
 	}
 
 	mutex_lock(&sqd->lock);
+	sq_start = jiffies;
 	while (1) {
 		bool cap_entries, sqt_spin = false;
 
@@ -251,6 +253,7 @@  static int io_sq_thread(void *data)
 		}
 
 		cap_entries = !list_is_singular(&sqd->ctx_list);
+		sq_work_begin = jiffies;
 		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
 			int ret = __io_sq_thread(ctx, cap_entries);
 
@@ -260,6 +263,11 @@  static int io_sq_thread(void *data)
 		if (io_run_task_work())
 			sqt_spin = true;
 
+		sq_work_end = jiffies;
+		sqd->total_time = sq_work_end - sq_start;
+		if (sqt_spin == true)
+			sqd->work_time += sq_work_end - sq_work_begin;
+
 		if (sqt_spin || !time_after(jiffies, timeout)) {
 			if (sqt_spin)
 				timeout = jiffies + sqd->sq_thread_idle;
diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
index 8df37e8c9149..92b4b07220fa 100644
--- a/io_uring/sqpoll.h
+++ b/io_uring/sqpoll.h
@@ -16,6 +16,8 @@  struct io_sq_data {
 	pid_t			task_pid;
 	pid_t			task_tgid;
 
+	unsigned long		work_time;
+	unsigned long		total_time;
 	unsigned long		state;
 	struct completion	exited;
 };