Message ID | 20230924124410.897646-1-mlevitsk@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | KVM: x86: tracepoint updates | expand |
On Sun, Sep 24, 2023, Maxim Levitsky wrote: > This patch series is intended to add some selected information > to the kvm tracepoints to make it easier to gather insights about > running nested guests. > > This patch series was developed together with a new x86 performance analysis tool > that I developed recently (https://gitlab.com/maximlevitsky/kvmon) > which aims to be a better kvm_stat, and allows you at glance > to see what is happening in a VM, including nesting. Rather than add more and more tracepoints, I think we should be more thoughtful about (a) where we place KVM's tracepoints and (b) giving userspace the necessary hooks to write BPF programs to extract whatever data is needed at any give time. There's simply no way we can iterate fast enough in KVM tracepoints to adapt to userspace's debug/monitoring needs. E.g. if it turns out someone wants detailed info on hypercalls that use memory or registers beyond ABCD, the new tracepoints won't help them. If all KVM tracepoints grab "struct kvm_vcpu" and force VMCS "registers" to be cached (or decached depending on one's viewpoint), then I think that'll serve 99% of usecases. E.g. the vCPU gives a BPF program kvm_vcpu, vcpu_{vmx,svm}, kvm, etc. trace_kvm_exit is good example, where despite all of the information that is captured by KVM, it's borderline worthless for CPUID and MSR exits because their interesting information is held in registers and not captured in the VMCS or VMCB. There are some on BTF type info issues that I've encountered, but I suspect that's as much a PEBKAC problem as anything.
У пн, 2023-09-25 у 17:03 -0700, Sean Christopherson пише: > On Sun, Sep 24, 2023, Maxim Levitsky wrote: > > This patch series is intended to add some selected information > > to the kvm tracepoints to make it easier to gather insights about > > running nested guests. > > > > This patch series was developed together with a new x86 performance analysis tool > > that I developed recently (https://gitlab.com/maximlevitsky/kvmon) > > which aims to be a better kvm_stat, and allows you at glance > > to see what is happening in a VM, including nesting. > > Rather than add more and more tracepoints, I think we should be more thoughtful > about (a) where we place KVM's tracepoints and (b) giving userspace the necessary > hooks to write BPF programs to extract whatever data is needed at any give time. > > There's simply no way we can iterate fast enough in KVM tracepoints to adapt to > userspace's debug/monitoring needs. E.g. if it turns out someone wants detailed > info on hypercalls that use memory or registers beyond ABCD, the new tracepoints > won't help them. > > If all KVM tracepoints grab "struct kvm_vcpu" and force VMCS "registers" to be > cached (or decached depending on one's viewpoint), then I think that'll serve 99% > of usecases. E.g. the vCPU gives a BPF program kvm_vcpu, vcpu_{vmx,svm}, kvm, etc. > > trace_kvm_exit is good example, where despite all of the information that is captured > by KVM, it's borderline worthless for CPUID and MSR exits because their interesting > information is held in registers and not captured in the VMCS or VMCB. > > There are some on BTF type info issues that I've encountered, but I suspect that's > as much a PEBKAC problem as anything. > While eBPF has its use cases, none of the extra tracepoints were added solely because of the monitoring tool and I do understand that tracepoints are a limited resource. Each added tracepoint/info was added only when it was also found to be useful for regular kvm tracing. Best regards, Maxim Levitsky
On 9/26/23 10:28, Maxim Levitsky wrote: >> trace_kvm_exit is good example, where despite all of the information that is captured >> by KVM, it's borderline worthless for CPUID and MSR exits because their interesting >> information is held in registers and not captured in the VMCS or VMCB. >> >> There are some on BTF type info issues that I've encountered, but I suspect that's >> as much a PEBKAC problem as anything. >> > While eBPF has its use cases, none of the extra tracepoints were added solely because of > the monitoring tool and I do understand that tracepoints are a limited resource. > > Each added tracepoint/info was added only when it was also found to be useful for regular > kvm tracing. I am not sure about _all_ of them, but I agree with both of you. On one hand, it would be pretty cool to have eBPF access to registers. On the other hand, the specific info you're adding is generic and I think there are only a couple exceptions where I am not sure it belongs in the generic KVM tracepoints. Paolo
On Tue, Sep 26, 2023, Paolo Bonzini wrote: > On 9/26/23 10:28, Maxim Levitsky wrote: > > > trace_kvm_exit is good example, where despite all of the information that is captured > > > by KVM, it's borderline worthless for CPUID and MSR exits because their interesting > > > information is held in registers and not captured in the VMCS or VMCB. > > > > > > There are some on BTF type info issues that I've encountered, but I suspect that's > > > as much a PEBKAC problem as anything. > > > > > While eBPF has its use cases, none of the extra tracepoints were added solely because of > > the monitoring tool and I do understand that tracepoints are a limited resource. > > > > Each added tracepoint/info was added only when it was also found to be useful for regular > > kvm tracing. > > I am not sure about _all_ of them, but I agree with both of you. > > On one hand, it would be pretty cool to have eBPF access to registers. On > the other hand, the specific info you're adding is generic and I think there > are only a couple exceptions where I am not sure it belongs in the generic > KVM tracepoints. I'm not saying this information isn't useful, *sometimes*. What I'm saying is that I don't think it's sustainable/maintainble to keep expanding KVM's tracepoints. E.g. why trace req_immediate_exit and not mmu_invalidate_seq[*]? It's practically impossible to predict exactly what information will be useful in the field. And for something like kvmon, IMO the onus is on userspace to do the heavy lifting. Rather than hardcode mounds of information in KVM's tracepoints, I think we should refactor KVM to make it as easy as possible to use BPF to piggyback tracepoints and get at data that *might* be interesting, and then add a variety of BPF programs (e.g. using bpftrace for simplicity) somewhere in tools/ to provide equivalent functionality to select existing tracepoints. E.g. if we rework kvm_vcpu to place "struct kvm_vcpu_arch arch" at offset '0', then we get at all the GPRs and pseudo-registers by hardcoding offsets into the struct, i.e. without needing BTF type info. More elaborate trace programs would likely need BTF, or maybe some clever shenanigans, but it seems very doable. [*] https://lore.kernel.org/all/ZOaUdP46f8XjQvio@google.com