diff mbox series

KVM: introduce vm's max_halt_poll_ns to debugfs

Message ID 20240508184743778PSWkv_r8dMoye7WmZ7enP@zte.com.cn (mailing list archive)
State New, archived
Headers show
Series KVM: introduce vm's max_halt_poll_ns to debugfs | expand

Commit Message

Cheng Lin May 8, 2024, 10:47 a.m. UTC
From: Cheng Lin <cheng.lin130@zte.com.cn>

Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
debugfs. Provide a way to check and modify them.

Signed-off-by: Cheng Lin <cheng.lin130@zte.com.cn>
---
 virt/kvm/kvm_main.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Sean Christopherson May 8, 2024, 3:58 p.m. UTC | #1
On Wed, May 08, 2024, cheng.lin130@zte.com.cn wrote:
> From: Cheng Lin <cheng.lin130@zte.com.cn>
> 
> Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> debugfs. Provide a way to check and modify them.

Why?
Cheng Lin May 9, 2024, 2:30 a.m. UTC | #2
> From: seanjc <seanjc@google.com>
> > From: Cheng Lin <cheng.lin130@zte.com.cn>
> >
> > Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> > debugfs. Provide a way to check and modify them.
> Why?
If a vm's max_halt_poll_ns has been set using KVM_CAP_HALT_POLL,
the module parameter kvm.halt_poll.ns will no longer indicate the maximum
halt pooling interval for that vm. After introducing these two attributes into
debugfs, it can be used to check whether the individual configuration of the
vm is enabled and the working value.
This patch provides a way to check and modify them through the debugfs.
Sean Christopherson May 9, 2024, 2:59 p.m. UTC | #3
On Thu, May 09, 2024, cheng.lin130@zte.com.cn wrote:
> > From: seanjc <seanjc@google.com>
> > > From: Cheng Lin <cheng.lin130@zte.com.cn>
> > >
> > > Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> > > debugfs. Provide a way to check and modify them.
> > Why?
> If a vm's max_halt_poll_ns has been set using KVM_CAP_HALT_POLL,
> the module parameter kvm.halt_poll.ns will no longer indicate the maximum
> halt pooling interval for that vm. After introducing these two attributes into
> debugfs, it can be used to check whether the individual configuration of the
> vm is enabled and the working value.

But why is max_halt_poll_ns special enough to warrant debugfs entries?  There is
a _lot_ of state in KVM that is configurable per-VM, it simply isn't feasible to
dump everything into debugfs.

I do think it would be reasonable to capture the max allowed polling time in
the existing tracepoint though, e.g.

diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 74e40d5d4af4..7e66e9b2e497 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -41,24 +41,26 @@ TRACE_EVENT(kvm_userspace_exit,
 );
 
 TRACE_EVENT(kvm_vcpu_wakeup,
-           TP_PROTO(__u64 ns, bool waited, bool valid),
-           TP_ARGS(ns, waited, valid),
+           TP_PROTO(__u64 ns, __u32 max_ns, bool waited, bool valid),
+           TP_ARGS(ns, max_ns, waited, valid),
 
        TP_STRUCT__entry(
                __field(        __u64,          ns              )
+               __field(        __u32,          max_ns          )
                __field(        bool,           waited          )
                __field(        bool,           valid           )
        ),
 
        TP_fast_assign(
                __entry->ns             = ns;
+               __entry->max_ns         = max_ns;
                __entry->waited         = waited;
                __entry->valid          = valid;
        ),
 
-       TP_printk("%s time %lld ns, polling %s",
+       TP_printk("%s time %llu ns (max poll %u ns), polling %s",
                  __entry->waited ? "wait" : "poll",
-                 __entry->ns,
+                 __entry->ns, __entry->max_ns,
                  __entry->valid ? "valid" : "invalid")
 );
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2e388972d856..f093138f3cd7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3846,7 +3846,8 @@ void kvm_vcpu_halt(struct kvm_vcpu *vcpu)
                }
        }
 
-       trace_kvm_vcpu_wakeup(halt_ns, waited, vcpu_valid_wakeup(vcpu));
+       trace_kvm_vcpu_wakeup(halt_ns, max_halt_poll_ns, waited,
+                             vcpu_valid_wakeup(vcpu));
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_halt);
Cheng Lin May 10, 2024, 3:18 a.m. UTC | #4
> > > From: seanjc <seanjc@google.com>
> > > > From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > >
> > > > Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> > > > debugfs. Provide a way to check and modify them.
> > > Why?
> > If a vm's max_halt_poll_ns has been set using KVM_CAP_HALT_POLL,
> > the module parameter kvm.halt_poll.ns will no longer indicate the maximum
> > halt pooling interval for that vm. After introducing these two attributes into
> > debugfs, it can be used to check whether the individual configuration of the
> > vm is enabled and the working value.
> But why is max_halt_poll_ns special enough to warrant debugfs entries?  There is
> a _lot_ of state in KVM that is configurable per-VM, it simply isn't feasible to
> dump everything into debugfs.
If we want to provide a directly modification interface under /sys for per-vm
max_halt_poll_ns, like module parameter /sys/module/kvm/parameters/halt_poll_ns,
using debugfs may be worth.
Further, if the override_halt_poll_ns under debugfs is set to be writable, it can even
achieve the setting of per-vm max_halt_poll_ns, as the KVM_CAP_HALL_POLL interface
does.
> I do think it would be reasonable to capture the max allowed polling time in
> the existing tracepoint though, e.g.
Yes, I agree it. 
It is sufficient to get per-vm max_halt_poll_ns through tracepoint if KVP_CAP_HALL_POLL
is used as the unique setting interface.

Do you consider it is worth to provide a setting interface other than KVP_CAP_HALL_POLL?
Sean Christopherson May 10, 2024, 2:07 p.m. UTC | #5
On Fri, May 10, 2024, cheng.lin130@zte.com.cn wrote:
> > > > From: seanjc <seanjc@google.com>
> > > > > From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > > >
> > > > > Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> > > > > debugfs. Provide a way to check and modify them.
> > > > Why?
> > > If a vm's max_halt_poll_ns has been set using KVM_CAP_HALT_POLL,
> > > the module parameter kvm.halt_poll.ns will no longer indicate the maximum
> > > halt pooling interval for that vm. After introducing these two attributes into
> > > debugfs, it can be used to check whether the individual configuration of the
> > > vm is enabled and the working value.
> > But why is max_halt_poll_ns special enough to warrant debugfs entries?  There is
> > a _lot_ of state in KVM that is configurable per-VM, it simply isn't feasible to
> > dump everything into debugfs.
> If we want to provide a directly modification interface under /sys for per-vm
> max_halt_poll_ns, like module parameter /sys/module/kvm/parameters/halt_poll_ns,
> using debugfs may be worth.

Yes, but _why_?  I know _what_ a debugs knob allows, but you have yet to explain
why this

General speaking, functionality of any kind should not be routed through debugfs,
it really is meant for debug.  E.g. it's typically root-only, is not guaranteed
to exist, its population is best-effort, etc.

> Further, if the override_halt_poll_ns under debugfs is set to be writable, it can even
> achieve the setting of per-vm max_halt_poll_ns, as the KVM_CAP_HALL_POLL interface
> does.
> > I do think it would be reasonable to capture the max allowed polling time in
> > the existing tracepoint though, e.g.
> Yes, I agree it. 
> It is sufficient to get per-vm max_halt_poll_ns through tracepoint if KVP_CAP_HALL_POLL
> is used as the unique setting interface.
> 
> Do you consider it is worth to provide a setting interface other than KVP_CAP_HALL_POLL?
Cheng Lin May 11, 2024, 2:34 a.m. UTC | #6
> > > > > From: seanjc <seanjc@google.com>
> > > > > > From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > > > >
> > > > > > Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> > > > > > debugfs. Provide a way to check and modify them.
> > > > > Why?
> > > > If a vm's max_halt_poll_ns has been set using KVM_CAP_HALT_POLL,
> > > > the module parameter kvm.halt_poll.ns will no longer indicate the maximum
> > > > halt pooling interval for that vm. After introducing these two attributes into
> > > > debugfs, it can be used to check whether the individual configuration of the
> > > > vm is enabled and the working value.
> > > But why is max_halt_poll_ns special enough to warrant debugfs entries?  There is
> > > a _lot_ of state in KVM that is configurable per-VM, it simply isn't feasible to
> > > dump everything into debugfs.
> > If we want to provide a directly modification interface under /sys for per-vm
> > max_halt_poll_ns, like module parameter /sys/module/kvm/parameters/halt_poll_ns,
> > using debugfs may be worth.
> Yes, but _why_?  I know _what_ a debugs knob allows, but you have yet to explain
> why this
I think that if such an interface is provided, it can be used to check the source of
vm's max_halt_poll_ns, general module parameter or per-vm configuration.
When configured through per-vm, such an interface can be used to monitor this
configuration. If there is an error in the setting through KVMCAP_HALL_POLL, such
an interface can be used to fix or reset it dynamicly.
> General speaking, functionality of any kind should not be routed through debugfs,
> it really is meant for debug.  E.g. it's typically root-only, is not guaranteed
> to exist, its population is best-effort, etc.
> > Further, if the override_halt_poll_ns under debugfs is set to be writable, it can even
> > achieve the setting of per-vm max_halt_poll_ns, as the KVM_CAP_HALL_POLL interface
> > does.
> > > I do think it would be reasonable to capture the max allowed polling time in
> > > the existing tracepoint though, e.g.
> > Yes, I agree it.
> > It is sufficient to get per-vm max_halt_poll_ns through tracepoint if KVP_CAP_HALL_POLL
> > is used as the unique setting interface.
> >
> > Do you consider it is worth to provide a setting interface other than KVP_CAP_HALL_POLL?
Sean Christopherson May 14, 2024, 10 p.m. UTC | #7
On Sat, May 11, 2024, cheng.lin130@zte.com.cn wrote:
> > > > > > From: seanjc <seanjc@google.com>
> > > > > > > From: Cheng Lin <cheng.lin130@zte.com.cn>
> > > > > > >
> > > > > > > Introduce vm's max_halt_poll_ns and override_halt_poll_ns to
> > > > > > > debugfs. Provide a way to check and modify them.
> > > > > > Why?
> > > > > If a vm's max_halt_poll_ns has been set using KVM_CAP_HALT_POLL,
> > > > > the module parameter kvm.halt_poll.ns will no longer indicate the maximum
> > > > > halt pooling interval for that vm. After introducing these two attributes into
> > > > > debugfs, it can be used to check whether the individual configuration of the
> > > > > vm is enabled and the working value.
> > > > But why is max_halt_poll_ns special enough to warrant debugfs entries?  There is
> > > > a _lot_ of state in KVM that is configurable per-VM, it simply isn't feasible to
> > > > dump everything into debugfs.
> > > If we want to provide a directly modification interface under /sys for per-vm
> > > max_halt_poll_ns, like module parameter /sys/module/kvm/parameters/halt_poll_ns,
> > > using debugfs may be worth.
> > Yes, but _why_?  I know _what_ a debugs knob allows, but you have yet to explain
> > why this
> I think that if such an interface is provided, it can be used to check the source of
> vm's max_halt_poll_ns, general module parameter or per-vm configuration.
> When configured through per-vm, such an interface can be used to monitor this
> configuration. If there is an error in the setting through KVMCAP_HALL_POLL, such
> an interface can be used to fix or reset it dynamicly.

But again, that argument can be made for myriad settings in KVM.  And unlike many
settings, a "bad" max_halt_poll_ns can be fixed simply by redoing KVM_CAP_HALL_POLL.

It's not KVM's responsibility to police userspace for bugs/errors, and IMO a
backdoor into max_halt_poll_ns isn't justified.
Cheng Lin May 15, 2024, 4:03 a.m. UTC | #8
> > > Yes, but _why_?  I know _what_ a debugs knob allows, but you have yet to explain
> > > why this
> > I think that if such an interface is provided, it can be used to check the source of
> > vm's max_halt_poll_ns, general module parameter or per-vm configuration.
> > When configured through per-vm, such an interface can be used to monitor this
> > configuration. If there is an error in the setting through KVMCAP_HALL_POLL, such
> > an interface can be used to fix or reset it dynamicly.
> But again, that argument can be made for myriad settings in KVM.  And unlike many
> settings, a "bad" max_halt_poll_ns can be fixed simply by redoing KVM_CAP_HALL_POLL.
Yes, Whether it is convenient to redo it will depend on the userspace.
> It's not KVM's responsibility to police userspace for bugs/errors, and IMO a
> backdoor into max_halt_poll_ns isn't justified.
Yes, It's not KVM's responsibility to police userspace. In addition to depend on userspace
redo, it can be seen as a planB to ensure that the VM works as expected.
diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff0a20565..60dae952c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1151,6 +1151,11 @@  static int kvm_create_vm_debugfs(struct kvm *kvm, const char *fdname)
 				    &stat_fops_per_vm);
 	}

+	debugfs_create_bool("override_halt_poll_ns", 0444, kvm->debugfs_dentry,
+			    &kvm->override_halt_poll_ns);
+	debugfs_create_u32("max_halt_poll_ns", 0644, kvm->debugfs_dentry,
+			   &kvm->max_halt_poll_ns);
+
 	kvm_arch_create_vm_debugfs(kvm);
 	return 0;
 out_err: