Message ID | 20180201125441.2f5b4fdd@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
2018-02-01 12:54-0500, Luiz Capitulino: > > Libvirt needs to know when a vCPU is halted. To get this information, I don't see why upper level management should care about that, a single bit about halted state that can be incorrect at the time it is processed seems of very limited use. (A much more sensible data point would be the fraction of time when VCPU was running or runnable, which is roughly what you get by sampling the halted state.) A halted vCPU it might even be halted in guest mode, so KVM doesn't know about that state (unless you force a VM exit), which would complicate the collection a bit more ... but really, what is the data being used for? User might care about the state, for obscure reasons, but that isn't a performance problem. > libvirt has started using the query-cpus command from QEMU. However, > if in kernel irqchip is in use, query-cpus will force all vCPUs > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. Libvirt knows if KVM exits to userspace on halts, so it can just query QEMU in that case and in the other case, there is a very dirty "solution" that works on all architectures right now: grep kvm_vcpu_block /proc/$vcpu_task/stack If you get something, the vcpu is halted in KVM. > This has catastrophic implications to low-latency workloads like > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > an OpenStack service called ceilometer that causes libvirt to > issue query-cpus every few minutes. I'd expect that people running these workloads can setup the system. :( I bet that ceilometer just mindlessly collects everything, so we should be able to configure libvirt to collect only some stats. Either libvirt or upper layer would decide what is too expensive for its usefulness. > The solution proposed in this patch is to export the vCPU > halted state in the already existing vcpu directory in sysfs. > This way, libvirt can read the vCPU halted state from sysfs and avoid > using the query-cpus command. This solution seems to be sufficient > for libvirt needs, but it has the following cons: > > * vcpu information in sysfs lives in a debug directory, so > libvirt would be basing its API on debug info (It pains me to say there probably already are tools that depend on kvm/debug.) It's slightly better than the stack hack, but needs more code in kernel and the interface is in a gray compatibility zone, so I'd like to know why does userspace do that in the first place. > * Currently, only x86 supports the vcpu dir in sysfs, so > we'd have to expand this to other archs (should be doable) > > If we agree that this solution is feasible, I'll work on extending > the vcpu debug information to other archs for my next posting. > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > --- > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void) > > int kvm_vcpu_halt(struct kvm_vcpu *vcpu) > { > + kvm_vcpu_set_halted(vcpu); There is no point to care about !lapic_in_kernel(). I'd move the logic into vcpu_block() to be shared among all architectures. > ++vcpu->stat.halt_exits; > if (lapic_in_kernel(vcpu)) { > vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > 2018-02-01 12:54-0500, Luiz Capitulino: > > > > Libvirt needs to know when a vCPU is halted. To get this information, > > I don't see why upper level management should care about that, a single > bit about halted state that can be incorrect at the time it is processed > seems of very limited use. I don't see why, either. I'm CCing libvir-list and the people involved in the code that added halt state to libvirt domain statistics. > > (A much more sensible data point would be the fraction of time when VCPU > was running or runnable, which is roughly what you get by sampling the > halted state.) > > A halted vCPU it might even be halted in guest mode, so KVM doesn't know > about that state (unless you force a VM exit), which would complicate > the collection a bit more ... but really, what is the data being used > for? > > User might care about the state, for obscure reasons, but that isn't a > performance problem. > > > libvirt has started using the query-cpus command from QEMU. However, > > if in kernel irqchip is in use, query-cpus will force all vCPUs > > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. > > Libvirt knows if KVM exits to userspace on halts, so it can just query > QEMU in that case and in the other case, there is a very dirty > "solution" that works on all architectures right now: > > grep kvm_vcpu_block /proc/$vcpu_task/stack > > If you get something, the vcpu is halted in KVM. Nice. > > > This has catastrophic implications to low-latency workloads like > > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > > an OpenStack service called ceilometer that causes libvirt to > > issue query-cpus every few minutes. > > I'd expect that people running these workloads can setup the system. :( > > I bet that ceilometer just mindlessly collects everything, so we should > be able to configure libvirt to collect only some stats. Either libvirt > or upper layer would decide what is too expensive for its usefulness. Yes. Including expensive-to-collect halt state in VIR_DOMAIN_STATS_VCPU is a serious performance regression in libvirt. > > > The solution proposed in this patch is to export the vCPU > > halted state in the already existing vcpu directory in sysfs. > > This way, libvirt can read the vCPU halted state from sysfs and avoid > > using the query-cpus command. This solution seems to be sufficient > > for libvirt needs, but it has the following cons: > > > > * vcpu information in sysfs lives in a debug directory, so > > libvirt would be basing its API on debug info > > (It pains me to say there probably already are tools that depend on > kvm/debug.) > > It's slightly better than the stack hack, but needs more code in kernel > and the interface is in a gray compatibility zone, so I'd like to know > why does userspace do that in the first place. > > > * Currently, only x86 supports the vcpu dir in sysfs, so > > we'd have to expand this to other archs (should be doable) > > > > If we agree that this solution is feasible, I'll work on extending > > the vcpu debug information to other archs for my next posting. > > > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > > --- > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void) > > > > int kvm_vcpu_halt(struct kvm_vcpu *vcpu) > > { > > + kvm_vcpu_set_halted(vcpu); > > There is no point to care about !lapic_in_kernel(). I'd move the logic > into vcpu_block() to be shared among all architectures. > > > ++vcpu->stat.halt_exits; > > if (lapic_in_kernel(vcpu)) { > > vcpu->arch.mp_state = KVM_MP_STATE_HALTED;
On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > 2018-02-01 12:54-0500, Luiz Capitulino: > > > > Libvirt needs to know when a vCPU is halted. To get this information, > > I don't see why upper level management should care about that, a single > bit about halted state that can be incorrect at the time it is processed > seems of very limited use. > > (A much more sensible data point would be the fraction of time when VCPU > was running or runnable, which is roughly what you get by sampling the > halted state.) > > A halted vCPU it might even be halted in guest mode, so KVM doesn't know > about that state (unless you force a VM exit), which would complicate > the collection a bit more ... but really, what is the data being used > for? I'm unclear what usage is being made of the halted state info, as it was an addition to libvirt from a 3rd party. My guess is that it was used to see if the vCPU had been brought online by the guest OS at all, after being hotplugged. As you say, the info is pretty useless once the vCPU is actually online & in use, becuase it changes so quickly. > > User might care about the state, for obscure reasons, but that isn't a > performance problem. > > > libvirt has started using the query-cpus command from QEMU. However, > > if in kernel irqchip is in use, query-cpus will force all vCPUs > > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. > > Libvirt knows if KVM exits to userspace on halts, so it can just query > QEMU in that case and in the other case, there is a very dirty > "solution" that works on all architectures right now: > > grep kvm_vcpu_block /proc/$vcpu_task/stack > > If you get something, the vcpu is halted in KVM. > > > This has catastrophic implications to low-latency workloads like > > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > > an OpenStack service called ceilometer that causes libvirt to > > issue query-cpus every few minutes. > > I'd expect that people running these workloads can setup the system. :( > > I bet that ceilometer just mindlessly collects everything, so we should > be able to configure libvirt to collect only some stats. Either libvirt > or upper layer would decide what is too expensive for its usefulness. > > > The solution proposed in this patch is to export the vCPU > > halted state in the already existing vcpu directory in sysfs. > > This way, libvirt can read the vCPU halted state from sysfs and avoid > > using the query-cpus command. This solution seems to be sufficient > > for libvirt needs, but it has the following cons: > > > > * vcpu information in sysfs lives in a debug directory, so > > libvirt would be basing its API on debug info > > (It pains me to say there probably already are tools that depend on > kvm/debug.) > > It's slightly better than the stack hack, but needs more code in kernel > and the interface is in a gray compatibility zone, so I'd like to know > why does userspace do that in the first place. > > > * Currently, only x86 supports the vcpu dir in sysfs, so > > we'd have to expand this to other archs (should be doable) > > > > If we agree that this solution is feasible, I'll work on extending > > the vcpu debug information to other archs for my next posting. > > > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > > --- > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void) > > > > int kvm_vcpu_halt(struct kvm_vcpu *vcpu) > > { > > + kvm_vcpu_set_halted(vcpu); > > There is no point to care about !lapic_in_kernel(). I'd move the logic > into vcpu_block() to be shared among all architectures. > > > ++vcpu->stat.halt_exits; > > if (lapic_in_kernel(vcpu)) { > > vcpu->arch.mp_state = KVM_MP_STATE_HALTED; Regards, Daniel
On Thu, Feb 01, 2018 at 12:54:41PM -0500, Luiz Capitulino wrote: > > Libvirt needs to know when a vCPU is halted. To get this information, > libvirt has started using the query-cpus command from QEMU. However, > if in kernel irqchip is in use, query-cpus will force all vCPUs > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. > This has catastrophic implications to low-latency workloads like > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > an OpenStack service called ceilometer that causes libvirt to > issue query-cpus every few minutes. > > The solution proposed in this patch is to export the vCPU > halted state in the already existing vcpu directory in sysfs. > This way, libvirt can read the vCPU halted state from sysfs and avoid > using the query-cpus command. This solution seems to be sufficient > for libvirt needs, but it has the following cons: > > * vcpu information in sysfs lives in a debug directory, so > libvirt would be basing its API on debug info Is this part of regular sysfs mount point, or does it require a debug fs to be mounted separately at /sys/fs/debug ? > * Currently, only x86 supports the vcpu dir in sysfs, so > we'd have to expand this to other archs (should be doable) Yep, that would be fairly important Regards, Daniel
On Fri, 2 Feb 2018 12:47:50 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > > 2018-02-01 12:54-0500, Luiz Capitulino: > > > > > > Libvirt needs to know when a vCPU is halted. To get this information, > > > > I don't see why upper level management should care about that, a single > > bit about halted state that can be incorrect at the time it is processed > > seems of very limited use. > > > > (A much more sensible data point would be the fraction of time when VCPU > > was running or runnable, which is roughly what you get by sampling the > > halted state.) > > > > A halted vCPU it might even be halted in guest mode, so KVM doesn't know > > about that state (unless you force a VM exit), which would complicate > > the collection a bit more ... but really, what is the data being used > > for? > > I'm unclear what usage is being made of the halted state info, as it was an > addition to libvirt from a 3rd party. My guess is that it was used to > see if the vCPU had been brought online by the guest OS at all, after being > hotplugged. As you say, the info is pretty useless once the vCPU is actually > online & in use, becuase it changes so quickly. I had assumed libvirt wanted to know if the vCPU was idle. Sure there are other ways to do this, but I also assumed KVM/QEMU wanted to export this state to upper management. If we don't want to do that, then we'll have to deprecate the "halted" field in query-cpus in QEMU, since that's how we got here. So, I think this is blocked on Viktor feedback to clarify the halted state usage. I'm CC'ing him again in this thread (although Eduardo already did that). > > User might care about the state, for obscure reasons, but that isn't a > > performance problem. > > > > > libvirt has started using the query-cpus command from QEMU. However, > > > if in kernel irqchip is in use, query-cpus will force all vCPUs > > > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. > > > > Libvirt knows if KVM exits to userspace on halts, so it can just query > > QEMU in that case and in the other case, there is a very dirty > > "solution" that works on all architectures right now: > > > > grep kvm_vcpu_block /proc/$vcpu_task/stack > > > > If you get something, the vcpu is halted in KVM. Well, I'm still not sure it's a good idea to have the halted bit in debugfs since that's for debug and not to build management APIs. What you propose is even more problematic since now the API depends on a KVM's internal function name... > > > > > This has catastrophic implications to low-latency workloads like > > > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > > > an OpenStack service called ceilometer that causes libvirt to > > > issue query-cpus every few minutes. > > > > I'd expect that people running these workloads can setup the system. :( > > > > I bet that ceilometer just mindlessly collects everything, so we should > > be able to configure libvirt to collect only some stats. Either libvirt > > or upper layer would decide what is too expensive for its usefulness. > > > > > The solution proposed in this patch is to export the vCPU > > > halted state in the already existing vcpu directory in sysfs. > > > This way, libvirt can read the vCPU halted state from sysfs and avoid > > > using the query-cpus command. This solution seems to be sufficient > > > for libvirt needs, but it has the following cons: > > > > > > * vcpu information in sysfs lives in a debug directory, so > > > libvirt would be basing its API on debug info > > > > (It pains me to say there probably already are tools that depend on > > kvm/debug.) > > > > It's slightly better than the stack hack, but needs more code in kernel > > and the interface is in a gray compatibility zone, so I'd like to know > > why does userspace do that in the first place. > > > > > * Currently, only x86 supports the vcpu dir in sysfs, so > > > we'd have to expand this to other archs (should be doable) > > > > > > If we agree that this solution is feasible, I'll work on extending > > > the vcpu debug information to other archs for my next posting. > > > > > > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> > > > --- > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void) > > > > > > int kvm_vcpu_halt(struct kvm_vcpu *vcpu) > > > { > > > + kvm_vcpu_set_halted(vcpu); > > > > There is no point to care about !lapic_in_kernel(). I'd move the logic > > into vcpu_block() to be shared among all architectures. > > > > > ++vcpu->stat.halt_exits; > > > if (lapic_in_kernel(vcpu)) { > > > vcpu->arch.mp_state = KVM_MP_STATE_HALTED; > > Regards, > Daniel
On Fri, 2 Feb 2018 12:49:25 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Thu, Feb 01, 2018 at 12:54:41PM -0500, Luiz Capitulino wrote: > > > > Libvirt needs to know when a vCPU is halted. To get this information, > > libvirt has started using the query-cpus command from QEMU. However, > > if in kernel irqchip is in use, query-cpus will force all vCPUs > > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. > > This has catastrophic implications to low-latency workloads like > > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > > an OpenStack service called ceilometer that causes libvirt to > > issue query-cpus every few minutes. > > > > The solution proposed in this patch is to export the vCPU > > halted state in the already existing vcpu directory in sysfs. > > This way, libvirt can read the vCPU halted state from sysfs and avoid > > using the query-cpus command. This solution seems to be sufficient > > for libvirt needs, but it has the following cons: > > > > * vcpu information in sysfs lives in a debug directory, so > > libvirt would be basing its API on debug info > > Is this part of regular sysfs mount point, or does it require a > debug fs to be mounted separately at /sys/fs/debug ? Yeah, it depends on debugfs being separately mounted at /sys/fs/debug. > > * Currently, only x86 supports the vcpu dir in sysfs, so > > we'd have to expand this to other archs (should be doable) > > Yep, that would be fairly important That's doable. The code that exports the vcpu information to sysfs is pretty arch-independent. Only x86 does it today because it's the only arch that actually exports anything per vcpu (which is the tsc-offset).
On 01.02.2018 21:26, Eduardo Habkost wrote: > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: >> 2018-02-01 12:54-0500, Luiz Capitulino: >>> >>> Libvirt needs to know when a vCPU is halted. To get this information, >> >> I don't see why upper level management should care about that, a single >> bit about halted state that can be incorrect at the time it is processed >> seems of very limited use. > > I don't see why, either. > > I'm CCing libvir-list and the people involved in the code that > added halt state to libvirt domain statistics. > I'll try to explain the motivation for the "halted" state exposure and why it ended int the libvirt domain stats. s390 CPUs can be present in a system (e.g. after being hotplugged) but be offline (disabled) in which case they are not used by the operating system. In Linux disabled CPUs show a value of '0' in /sys/devices/system/cpu/cpu<n>/online. Higher level management software (on top of libvirt) can take advantage of knowing whether a guest CPU is online and thus used or not. Specifically it might not make sense to plug more CPUs if the guest OS isn't using the CPUs at all. A disabled guest CPU is represented as halted in the QEMU object model and can therefore be identified by the QMP query-cpus command. The initial patch proposal to expose this via virsh vcpuinfo was not considered to be desirable because there was a concern that legacy management software might be confused seeing halted vcpus. Therefore the state information was added to the cpu domain statistics. One issue we're facing is that the semantics of "halted" are different between s390 and at least x86. The question might be whether they are different enough to grant a specific "disabled" indicator. [...]
On Fri, 2 Feb 2018 14:53:50 +0100 Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > On 01.02.2018 21:26, Eduardo Habkost wrote: > > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > >> 2018-02-01 12:54-0500, Luiz Capitulino: > >>> > >>> Libvirt needs to know when a vCPU is halted. To get this information, > >> > >> I don't see why upper level management should care about that, a single > >> bit about halted state that can be incorrect at the time it is processed > >> seems of very limited use. > > > > I don't see why, either. > > > > I'm CCing libvir-list and the people involved in the code that > > added halt state to libvirt domain statistics. > > > I'll try to explain the motivation for the "halted" state exposure and > why it ended int the libvirt domain stats. > > s390 CPUs can be present in a system (e.g. after being hotplugged) but > be offline (disabled) in which case they are not used by the operating > system. In Linux disabled CPUs show a value of '0' in > /sys/devices/system/cpu/cpu<n>/online. If that's all you want, have you considered using the guest agent? > Higher level management software (on top of libvirt) can take advantage > of knowing whether a guest CPU is online and thus used or not. > Specifically it might not make sense to plug more CPUs if the guest OS > isn't using the CPUs at all. OK, so what's the algorithm used by the higher level management software where this all fits together? Something like: 1. Hotplug vCPU 2. Poll "halted" state 3. If "halted" becomes true, hotplug more vCPUs 4. If "halted" never becomes true, don't hotplug more CPUs If that's the case, then I guess grepping for State in /proc/qemu-pid/threadid/status will have the same end result, no? > > A disabled guest CPU is represented as halted in the QEMU object model > and can therefore be identified by the QMP query-cpus command. > > The initial patch proposal to expose this via virsh vcpuinfo was not > considered to be desirable because there was a concern that legacy > management software might be confused seeing halted vcpus. Therefore the > state information was added to the cpu domain statistics. > > One issue we're facing is that the semantics of "halted" are different > between s390 and at least x86. The question might be whether they are > different enough to grant a specific "disabled" indicator. > > [...] >
On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote: > On 01.02.2018 21:26, Eduardo Habkost wrote: > > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > >> 2018-02-01 12:54-0500, Luiz Capitulino: > >>> > >>> Libvirt needs to know when a vCPU is halted. To get this information, > >> > >> I don't see why upper level management should care about that, a single > >> bit about halted state that can be incorrect at the time it is processed > >> seems of very limited use. > > > > I don't see why, either. > > > > I'm CCing libvir-list and the people involved in the code that > > added halt state to libvirt domain statistics. > > > I'll try to explain the motivation for the "halted" state exposure and > why it ended int the libvirt domain stats. > > s390 CPUs can be present in a system (e.g. after being hotplugged) but > be offline (disabled) in which case they are not used by the operating > system. In Linux disabled CPUs show a value of '0' in > /sys/devices/system/cpu/cpu<n>/online. > > Higher level management software (on top of libvirt) can take advantage > of knowing whether a guest CPU is online and thus used or not. > Specifically it might not make sense to plug more CPUs if the guest OS > isn't using the CPUs at all. Wasn't this already represented on "vcpu.<n>.state"? Why is "vcpu.<n>.halted" needed? > > A disabled guest CPU is represented as halted in the QEMU object model > and can therefore be identified by the QMP query-cpus command. > > The initial patch proposal to expose this via virsh vcpuinfo was not > considered to be desirable because there was a concern that legacy > management software might be confused seeing halted vcpus. Therefore the > state information was added to the cpu domain statistics. > > One issue we're facing is that the semantics of "halted" are different > between s390 and at least x86. The question might be whether they are > different enough to grant a specific "disabled" indicator. From your description, it looks like they are completely different. On x86, a CPU that is online and in use can be moved between halted and non-halted state many times a second. If that's the case, we can probably fix this without breaking existing code: explicitly documenting the semantics of "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not online" (i.e. the s390 semantics, not the x86 one), and making qemuMonitorGetCpuHalted() s390-specific. Possibly a better long-term solution is to deprecate "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on s390. It would be also interesting to update QEMU QMP documentation to clarify the arch-specific semantics of "halted".
On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote: > > On 01.02.2018 21:26, Eduardo Habkost wrote: > > > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > > >> 2018-02-01 12:54-0500, Luiz Capitulino: > > >>> > > >>> Libvirt needs to know when a vCPU is halted. To get this information, > > >> > > >> I don't see why upper level management should care about that, a single > > >> bit about halted state that can be incorrect at the time it is processed > > >> seems of very limited use. > > > > > > I don't see why, either. > > > > > > I'm CCing libvir-list and the people involved in the code that > > > added halt state to libvirt domain statistics. > > > > > I'll try to explain the motivation for the "halted" state exposure and > > why it ended int the libvirt domain stats. > > > > s390 CPUs can be present in a system (e.g. after being hotplugged) but > > be offline (disabled) in which case they are not used by the operating > > system. In Linux disabled CPUs show a value of '0' in > > /sys/devices/system/cpu/cpu<n>/online. > > > > Higher level management software (on top of libvirt) can take advantage > > of knowing whether a guest CPU is online and thus used or not. > > Specifically it might not make sense to plug more CPUs if the guest OS > > isn't using the CPUs at all. > > Wasn't this already represented on "vcpu.<n>.state"? Why is > "vcpu.<n>.halted" needed? > > > > > A disabled guest CPU is represented as halted in the QEMU object model > > and can therefore be identified by the QMP query-cpus command. > > > > The initial patch proposal to expose this via virsh vcpuinfo was not > > considered to be desirable because there was a concern that legacy > > management software might be confused seeing halted vcpus. Therefore the > > state information was added to the cpu domain statistics. > > > > One issue we're facing is that the semantics of "halted" are different > > between s390 and at least x86. The question might be whether they are > > different enough to grant a specific "disabled" indicator. > > From your description, it looks like they are completely > different. On x86, a CPU that is online and in use can be moved > between halted and non-halted state many times a second. > > If that's the case, we can probably fix this without breaking > existing code: explicitly documenting the semantics of > "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > online" (i.e. the s390 semantics, not the x86 one), and making > qemuMonitorGetCpuHalted() s390-specific. > > Possibly a better long-term solution is to deprecate > "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > s390. > > It would be also interesting to update QEMU QMP documentation to > clarify the arch-specific semantics of "halted". Any also especially clarify the awful performance implications of running this particular query command. In general I would not expect query-xxx monitor commands to interrupt all vcpus, so we should clearly warn about this ! Regards, Daniel
On Fri, 2 Feb 2018 14:19:38 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > > On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote: > > > On 01.02.2018 21:26, Eduardo Habkost wrote: > > > > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > > > >> 2018-02-01 12:54-0500, Luiz Capitulino: > > > >>> > > > >>> Libvirt needs to know when a vCPU is halted. To get this information, > > > >> > > > >> I don't see why upper level management should care about that, a single > > > >> bit about halted state that can be incorrect at the time it is processed > > > >> seems of very limited use. > > > > > > > > I don't see why, either. > > > > > > > > I'm CCing libvir-list and the people involved in the code that > > > > added halt state to libvirt domain statistics. > > > > > > > I'll try to explain the motivation for the "halted" state exposure and > > > why it ended int the libvirt domain stats. > > > > > > s390 CPUs can be present in a system (e.g. after being hotplugged) but > > > be offline (disabled) in which case they are not used by the operating > > > system. In Linux disabled CPUs show a value of '0' in > > > /sys/devices/system/cpu/cpu<n>/online. > > > > > > Higher level management software (on top of libvirt) can take advantage > > > of knowing whether a guest CPU is online and thus used or not. > > > Specifically it might not make sense to plug more CPUs if the guest OS > > > isn't using the CPUs at all. > > > > Wasn't this already represented on "vcpu.<n>.state"? Why is > > "vcpu.<n>.halted" needed? > > > > > > > > A disabled guest CPU is represented as halted in the QEMU object model > > > and can therefore be identified by the QMP query-cpus command. > > > > > > The initial patch proposal to expose this via virsh vcpuinfo was not > > > considered to be desirable because there was a concern that legacy > > > management software might be confused seeing halted vcpus. Therefore the > > > state information was added to the cpu domain statistics. > > > > > > One issue we're facing is that the semantics of "halted" are different > > > between s390 and at least x86. The question might be whether they are > > > different enough to grant a specific "disabled" indicator. > > > > From your description, it looks like they are completely > > different. On x86, a CPU that is online and in use can be moved > > between halted and non-halted state many times a second. > > > > If that's the case, we can probably fix this without breaking > > existing code: explicitly documenting the semantics of > > "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > > online" (i.e. the s390 semantics, not the x86 one), and making > > qemuMonitorGetCpuHalted() s390-specific. > > > > Possibly a better long-term solution is to deprecate > > "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > > s390. > > > > It would be also interesting to update QEMU QMP documentation to > > clarify the arch-specific semantics of "halted". > > Any also especially clarify the awful performance implications of running > this particular query command. In general I would not expect query-xxx > monitor commands to interrupt all vcpus, so we should clearly warn about > this ! Or deprecate it...
(CCing qemu-devel) On Fri, Feb 02, 2018 at 09:21:59AM -0500, Luiz Capitulino wrote: > On Fri, 2 Feb 2018 14:19:38 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: [...] > > > It would be also interesting to update QEMU QMP documentation to > > > clarify the arch-specific semantics of "halted". > > > > Any also especially clarify the awful performance implications of running > > this particular query command. In general I would not expect query-xxx > > monitor commands to interrupt all vcpus, so we should clearly warn about > > this ! > > Or deprecate it... We could deprecate the expensive fields on query-cpus, and move them to a more expensive query-cpu-state command. I believe most users of query-cpus are only interested in qom_path, thread_id, and topology info. Markus, Eric: from the QAPI point of view, is it OK to remove fields between QEMU versions, as long as we follow our deprecation policy?
On Fri, 2 Feb 2018 12:50:14 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > (CCing qemu-devel) > > On Fri, Feb 02, 2018 at 09:21:59AM -0500, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 14:19:38 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > [...] > > > > It would be also interesting to update QEMU QMP documentation to > > > > clarify the arch-specific semantics of "halted". > > > > > > Any also especially clarify the awful performance implications of running > > > this particular query command. In general I would not expect query-xxx > > > monitor commands to interrupt all vcpus, so we should clearly warn about > > > this ! > > > > Or deprecate it... > > We could deprecate the expensive fields on query-cpus, and move > them to a more expensive query-cpu-state command. I believe most > users of query-cpus are only interested in qom_path, thread_id, > and topology info. Agree. The only thing I'm unsure about is that, is the performance issue only present in x86? If yes, then do we deprecate it only for x86 or for all archs? Maybe for all archs, otherwise this has the potential to turn into a mess. > Markus, Eric: from the QAPI point of view, is it OK to remove > fields between QEMU versions, as long as we follow our > deprecation policy? I guess we can't remove fields, but maybe we could always return "running" and skip interrupting the vCPU threads.
On Fri, Feb 02, 2018 at 12:50:14PM -0200, Eduardo Habkost wrote: > (CCing qemu-devel) > > On Fri, Feb 02, 2018 at 09:21:59AM -0500, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 14:19:38 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > [...] > > > > It would be also interesting to update QEMU QMP documentation to > > > > clarify the arch-specific semantics of "halted". > > > > > > Any also especially clarify the awful performance implications of running > > > this particular query command. In general I would not expect query-xxx > > > monitor commands to interrupt all vcpus, so we should clearly warn about > > > this ! > > > > Or deprecate it... > > We could deprecate the expensive fields on query-cpus, and move > them to a more expensive query-cpu-state command. I believe most > users of query-cpus are only interested in qom_path, thread_id, > and topology info. > > Markus, Eric: from the QAPI point of view, is it OK to remove > fields between QEMU versions, as long as we follow our > deprecation policy? I would expect that to not be OK. A fully backwards compatible way to deal with this would just be to add a flag to the query-cpus command eg something like query-cpus arch-specific=false to turn off all this arch specific state, and just report the cheap generic info. If it defaults to arch-specific=true when omitted, then there's no compat problems. Regards, Daniel
On 02.02.2018 15:15, Eduardo Habkost wrote: > On Fri, Feb 02, 2018 at 02:53:50PM +0100, Viktor Mihajlovski wrote: >> On 01.02.2018 21:26, Eduardo Habkost wrote: >>> On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: >>>> 2018-02-01 12:54-0500, Luiz Capitulino: >>>>> >>>>> Libvirt needs to know when a vCPU is halted. To get this information, >>>> >>>> I don't see why upper level management should care about that, a single >>>> bit about halted state that can be incorrect at the time it is processed >>>> seems of very limited use. >>> >>> I don't see why, either. >>> >>> I'm CCing libvir-list and the people involved in the code that >>> added halt state to libvirt domain statistics. >>> >> I'll try to explain the motivation for the "halted" state exposure and >> why it ended int the libvirt domain stats. >> >> s390 CPUs can be present in a system (e.g. after being hotplugged) but >> be offline (disabled) in which case they are not used by the operating >> system. In Linux disabled CPUs show a value of '0' in >> /sys/devices/system/cpu/cpu<n>/online. >> >> Higher level management software (on top of libvirt) can take advantage >> of knowing whether a guest CPU is online and thus used or not. >> Specifically it might not make sense to plug more CPUs if the guest OS >> isn't using the CPUs at all. > > Wasn't this already represented on "vcpu.<n>.state"? Why is > "vcpu.<n>.halted" needed? The state would match that of vcpuinfo, and there was consensus not to change it (on x86 the CPU is in state running, even if halted). > >> >> A disabled guest CPU is represented as halted in the QEMU object model >> and can therefore be identified by the QMP query-cpus command. >> >> The initial patch proposal to expose this via virsh vcpuinfo was not >> considered to be desirable because there was a concern that legacy >> management software might be confused seeing halted vcpus. Therefore the >> state information was added to the cpu domain statistics. >> >> One issue we're facing is that the semantics of "halted" are different >> between s390 and at least x86. The question might be whether they are >> different enough to grant a specific "disabled" indicator. > > From your description, it looks like they are completely > different. On x86, a CPU that is online and in use can be moved > between halted and non-halted state many times a second. > > If that's the case, we can probably fix this without breaking > existing code: explicitly documenting the semantics of > "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > online" (i.e. the s390 semantics, not the x86 one), and making > qemuMonitorGetCpuHalted() s390-specific. > > Possibly a better long-term solution is to deprecate > "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > s390> As it seems that nobody was ever *really* interested in x86.halted, one could also return 0 unconditionally there (and for other expensive-to-query arches)? > It would be also interesting to update QEMU QMP documentation to > clarify the arch-specific semantics of "halted". >
On Fri, 2 Feb 2018 16:08:25 +0100 Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > >> A disabled guest CPU is represented as halted in the QEMU object model > >> and can therefore be identified by the QMP query-cpus command. > >> > >> The initial patch proposal to expose this via virsh vcpuinfo was not > >> considered to be desirable because there was a concern that legacy > >> management software might be confused seeing halted vcpus. Therefore the > >> state information was added to the cpu domain statistics. > >> > >> One issue we're facing is that the semantics of "halted" are different > >> between s390 and at least x86. The question might be whether they are > >> different enough to grant a specific "disabled" indicator. > > > > From your description, it looks like they are completely > > different. On x86, a CPU that is online and in use can be moved > > between halted and non-halted state many times a second. > > > > If that's the case, we can probably fix this without breaking > > existing code: explicitly documenting the semantics of > > "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > > online" (i.e. the s390 semantics, not the x86 one), and making > > qemuMonitorGetCpuHalted() s390-specific. > > > > Possibly a better long-term solution is to deprecate > > "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > > s390> > As it seems that nobody was ever *really* interested in x86.halted, one > could also return 0 unconditionally there (and for other > expensive-to-query arches)? The most important question I have is: does this solution satisfy the needs of upper management? That is, if we implement the solution suggested by Eduardo than the feature of automatically hotplugging more CPUs will only work for s390. Is this OK? If yes, then I think this is the best solution. And the next question would be: Viktor, can you change this in libvirt while we fix query-cpus in QEMU? Btw, I guess OpenStack ran into this issue just because this field slipped into domstats API and ceilometer issues that command... > > It would be also interesting to update QEMU QMP documentation to > > clarify the arch-specific semantics of "halted". > > > >
On Fri, Feb 02, 2018 at 03:07:18PM +0000, Daniel P. Berrangé wrote: > On Fri, Feb 02, 2018 at 12:50:14PM -0200, Eduardo Habkost wrote: > > (CCing qemu-devel) > > > > On Fri, Feb 02, 2018 at 09:21:59AM -0500, Luiz Capitulino wrote: > > > On Fri, 2 Feb 2018 14:19:38 +0000 > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > > [...] > > > > > It would be also interesting to update QEMU QMP documentation to > > > > > clarify the arch-specific semantics of "halted". > > > > > > > > Any also especially clarify the awful performance implications of running > > > > this particular query command. In general I would not expect query-xxx > > > > monitor commands to interrupt all vcpus, so we should clearly warn about > > > > this ! > > > > > > Or deprecate it... > > > > We could deprecate the expensive fields on query-cpus, and move > > them to a more expensive query-cpu-state command. I believe most > > users of query-cpus are only interested in qom_path, thread_id, > > and topology info. > > > > Markus, Eric: from the QAPI point of view, is it OK to remove > > fields between QEMU versions, as long as we follow our > > deprecation policy? > > I would expect that to not be OK. A fully backwards compatible way to > deal with this would just be to add a flag to the query-cpus command > eg something like > > query-cpus arch-specific=false > > to turn off all this arch specific state, and just report the cheap > generic info. If it defaults to arch-specific=true when omitted, then > there's no compat problems. This would work, too. I would name it "full-state", "extended-state" or something similar, though. Not all arch-specific data is expensive to fetch, and not all non-arch-specific data is unexpensive. But I'd like to confirm if it's OK to make existing non-optional struct fields optional in the QAPI schema. Markus, Eric?
On 02.02.2018 16:22, Luiz Capitulino wrote: > On Fri, 2 Feb 2018 16:08:25 +0100 > Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > >>>> A disabled guest CPU is represented as halted in the QEMU object model >>>> and can therefore be identified by the QMP query-cpus command. >>>> >>>> The initial patch proposal to expose this via virsh vcpuinfo was not >>>> considered to be desirable because there was a concern that legacy >>>> management software might be confused seeing halted vcpus. Therefore the >>>> state information was added to the cpu domain statistics. >>>> >>>> One issue we're facing is that the semantics of "halted" are different >>>> between s390 and at least x86. The question might be whether they are >>>> different enough to grant a specific "disabled" indicator. >>> >>> From your description, it looks like they are completely >>> different. On x86, a CPU that is online and in use can be moved >>> between halted and non-halted state many times a second. >>> >>> If that's the case, we can probably fix this without breaking >>> existing code: explicitly documenting the semantics of >>> "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not >>> online" (i.e. the s390 semantics, not the x86 one), and making >>> qemuMonitorGetCpuHalted() s390-specific. >>> >>> Possibly a better long-term solution is to deprecate >>> "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on >>> s390> >> As it seems that nobody was ever *really* interested in x86.halted, one >> could also return 0 unconditionally there (and for other >> expensive-to-query arches)? > > The most important question I have is: does this solution satisfy the > needs of upper management? That is, if we implement the solution suggested > by Eduardo than the feature of automatically hotplugging more CPUs > will only work for s390. Is this OK? > > If yes, then I think this is the best solution. And the next question > would be: Viktor, can you change this in libvirt while we fix query-cpus > in QEMU? > The latest proposal was to use a flag for query-cpus (like full-state) which would control the set of properties queried and reported. If this is the way we decide to go, I can make the necessary changes in libvirt. > Btw, I guess OpenStack ran into this issue just because this field > slipped into domstats API and ceilometer issues that command... > >>> It would be also interesting to update QEMU QMP documentation to >>> clarify the arch-specific semantics of "halted". >>> >> >> >
On Fri, Feb 02, 2018 at 04:51:23PM +0100, Viktor Mihajlovski wrote: > On 02.02.2018 16:22, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 16:08:25 +0100 > > Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > > > >>>> A disabled guest CPU is represented as halted in the QEMU object model > >>>> and can therefore be identified by the QMP query-cpus command. > >>>> > >>>> The initial patch proposal to expose this via virsh vcpuinfo was not > >>>> considered to be desirable because there was a concern that legacy > >>>> management software might be confused seeing halted vcpus. Therefore the > >>>> state information was added to the cpu domain statistics. > >>>> > >>>> One issue we're facing is that the semantics of "halted" are different > >>>> between s390 and at least x86. The question might be whether they are > >>>> different enough to grant a specific "disabled" indicator. > >>> > >>> From your description, it looks like they are completely > >>> different. On x86, a CPU that is online and in use can be moved > >>> between halted and non-halted state many times a second. > >>> > >>> If that's the case, we can probably fix this without breaking > >>> existing code: explicitly documenting the semantics of > >>> "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > >>> online" (i.e. the s390 semantics, not the x86 one), and making > >>> qemuMonitorGetCpuHalted() s390-specific. > >>> > >>> Possibly a better long-term solution is to deprecate > >>> "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > >>> s390> > >> As it seems that nobody was ever *really* interested in x86.halted, one > >> could also return 0 unconditionally there (and for other > >> expensive-to-query arches)? > > > > The most important question I have is: does this solution satisfy the > > needs of upper management? That is, if we implement the solution suggested > > by Eduardo than the feature of automatically hotplugging more CPUs > > will only work for s390. Is this OK? > > > > If yes, then I think this is the best solution. And the next question > > would be: Viktor, can you change this in libvirt while we fix query-cpus > > in QEMU? > > > The latest proposal was to use a flag for query-cpus (like full-state) > which would control the set of properties queried and reported. If this > is the way we decide to go, I can make the necessary changes in libvirt. Regardless of whether we add that flag to query-cpus or not, we still have the general problem of solving the cross-architecture semantics to be more sane. Regards, Daniel
On Fri, 2 Feb 2018 16:51:23 +0100 Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > On 02.02.2018 16:22, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 16:08:25 +0100 > > Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > > > >>>> A disabled guest CPU is represented as halted in the QEMU object model > >>>> and can therefore be identified by the QMP query-cpus command. > >>>> > >>>> The initial patch proposal to expose this via virsh vcpuinfo was not > >>>> considered to be desirable because there was a concern that legacy > >>>> management software might be confused seeing halted vcpus. Therefore the > >>>> state information was added to the cpu domain statistics. > >>>> > >>>> One issue we're facing is that the semantics of "halted" are different > >>>> between s390 and at least x86. The question might be whether they are > >>>> different enough to grant a specific "disabled" indicator. > >>> > >>> From your description, it looks like they are completely > >>> different. On x86, a CPU that is online and in use can be moved > >>> between halted and non-halted state many times a second. > >>> > >>> If that's the case, we can probably fix this without breaking > >>> existing code: explicitly documenting the semantics of > >>> "vcpu.<n>.halted" at virConnectGetAllDomainStats() to mean "not > >>> online" (i.e. the s390 semantics, not the x86 one), and making > >>> qemuMonitorGetCpuHalted() s390-specific. > >>> > >>> Possibly a better long-term solution is to deprecate > >>> "vcpu.<n>.halted" and make "vcpu.<n>.state" work correctly on > >>> s390> > >> As it seems that nobody was ever *really* interested in x86.halted, one > >> could also return 0 unconditionally there (and for other > >> expensive-to-query arches)? > > > > The most important question I have is: does this solution satisfy the > > needs of upper management? That is, if we implement the solution suggested > > by Eduardo than the feature of automatically hotplugging more CPUs > > will only work for s390. Is this OK? > > > > If yes, then I think this is the best solution. And the next question > > would be: Viktor, can you change this in libvirt while we fix query-cpus > > in QEMU? > > > The latest proposal was to use a flag for query-cpus (like full-state) > which would control the set of properties queried and reported. If this > is the way we decide to go, I can make the necessary changes in libvirt. OK, I thought we were going to do both. Because, if libvirt only wants the halted field for s390 then why issue query-cpus at all in other archs? > > Btw, I guess OpenStack ran into this issue just because this field > > slipped into domstats API and ceilometer issues that command... > > > >>> It would be also interesting to update QEMU QMP documentation to > >>> clarify the arch-specific semantics of "halted". > >>> > >> > >> > > > >
On Fri, 2 Feb 2018 15:54:15 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > > > The most important question I have is: does this solution satisfy the > > > needs of upper management? That is, if we implement the solution suggested > > > by Eduardo than the feature of automatically hotplugging more CPUs > > > will only work for s390. Is this OK? > > > > > > If yes, then I think this is the best solution. And the next question > > > would be: Viktor, can you change this in libvirt while we fix query-cpus > > > in QEMU? > > > > > The latest proposal was to use a flag for query-cpus (like full-state) > > which would control the set of properties queried and reported. If this > > is the way we decide to go, I can make the necessary changes in libvirt. > > Regardless of whether we add that flag to query-cpus or not, we still have > the general problem of solving the cross-architecture semantics to be > more sane. Let's the both then: o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by itself fixes the original performance issue o Deprecate the "halted" field in query-cpus in QEMU. This fixes new instances of this same problem
On Fri, 2 Feb 2018 11:01:37 -0500 Luiz Capitulino <lcapitulino@redhat.com> wrote: > On Fri, 2 Feb 2018 15:54:15 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > > The most important question I have is: does this solution satisfy the > > > > needs of upper management? That is, if we implement the solution suggested > > > > by Eduardo than the feature of automatically hotplugging more CPUs > > > > will only work for s390. Is this OK? > > > > > > > > If yes, then I think this is the best solution. And the next question > > > > would be: Viktor, can you change this in libvirt while we fix query-cpus > > > > in QEMU? > > > > > > > The latest proposal was to use a flag for query-cpus (like full-state) > > > which would control the set of properties queried and reported. If this > > > is the way we decide to go, I can make the necessary changes in libvirt. > > > > Regardless of whether we add that flag to query-cpus or not, we still have > > the general problem of solving the cross-architecture semantics to be > > more sane. > > Let's the both then: > > o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by > itself fixes the original performance issue > > o Deprecate the "halted" field in query-cpus in QEMU. This fixes new > instances of this same problem Btw, let me emphasize that I think the "halted" field has to be deprecated from the default query-cpus output, otherwise new instances of this issue are possible.
On 02.02.2018 17:01, Luiz Capitulino wrote: > On Fri, 2 Feb 2018 15:54:15 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > >>>> The most important question I have is: does this solution satisfy the >>>> needs of upper management? That is, if we implement the solution suggested >>>> by Eduardo than the feature of automatically hotplugging more CPUs >>>> will only work for s390. Is this OK? >>>> >>>> If yes, then I think this is the best solution. And the next question >>>> would be: Viktor, can you change this in libvirt while we fix query-cpus >>>> in QEMU? >>>> >>> The latest proposal was to use a flag for query-cpus (like full-state) >>> which would control the set of properties queried and reported. If this >>> is the way we decide to go, I can make the necessary changes in libvirt. >> >> Regardless of whether we add that flag to query-cpus or not, we still have >> the general problem of solving the cross-architecture semantics to be >> more sane. > > Let's the both then: > > o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by > itself fixes the original performance issue We are normally trying to avoid architecture-specific code in libvirt (not always successfully). We could omit the call, based on a QEMU Capability derived from the presence of said flag. This would change the libvirt-client side default to not report halted. A client can the still request the value via a tbd libvirt flag. Which is what an s390-aware management app would have to do... > > o Deprecate the "halted" field in query-cpus in QEMU. This fixes new > instances of this same problem > That is probably OK, if we can get a replacement (e.g. a new state).
* Eduardo Habkost (ehabkost@redhat.com) wrote: > (CCing qemu-devel) > > On Fri, Feb 02, 2018 at 09:21:59AM -0500, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 14:19:38 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > [...] > > > > It would be also interesting to update QEMU QMP documentation to > > > > clarify the arch-specific semantics of "halted". > > > > > > Any also especially clarify the awful performance implications of running > > > this particular query command. In general I would not expect query-xxx > > > monitor commands to interrupt all vcpus, so we should clearly warn about > > > this ! > > > > Or deprecate it... > > We could deprecate the expensive fields on query-cpus, and move > them to a more expensive query-cpu-state command. I believe most > users of query-cpus are only interested in qom_path, thread_id, > and topology info. Would that data be available without the bql? I ask because if it is then a small advantage to having a separate command is that the command could be marked OOB with Peter's new series and never take the lock. Dave > Markus, Eric: from the QAPI point of view, is it OK to remove > fields between QEMU versions, as long as we follow our > deprecation policy? > > -- > Eduardo > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Feb 02, 2018 at 05:23:59PM +0000, Dr. David Alan Gilbert wrote: > * Eduardo Habkost (ehabkost@redhat.com) wrote: > > (CCing qemu-devel) > > > > On Fri, Feb 02, 2018 at 09:21:59AM -0500, Luiz Capitulino wrote: > > > On Fri, 2 Feb 2018 14:19:38 +0000 > > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > > On Fri, Feb 02, 2018 at 12:15:54PM -0200, Eduardo Habkost wrote: > > [...] > > > > > It would be also interesting to update QEMU QMP documentation to > > > > > clarify the arch-specific semantics of "halted". > > > > > > > > Any also especially clarify the awful performance implications of running > > > > this particular query command. In general I would not expect query-xxx > > > > monitor commands to interrupt all vcpus, so we should clearly warn about > > > > this ! > > > > > > Or deprecate it... > > > > We could deprecate the expensive fields on query-cpus, and move > > them to a more expensive query-cpu-state command. I believe most > > users of query-cpus are only interested in qom_path, thread_id, > > and topology info. > > Would that data be available without the bql? I ask because if it is > then a small advantage to having a separate command is that the command > could be marked OOB with Peter's new series and never take the lock. We would need a mechanism to safely walk the CPU list without the BQL, so I wouldn't bother trying to make a OOB-capable version of query-cpus unless really necessary.
On Fri, Feb 02, 2018 at 05:19:34PM +0100, Viktor Mihajlovski wrote: > On 02.02.2018 17:01, Luiz Capitulino wrote: [...] > > o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by > > itself fixes the original performance issue > We are normally trying to avoid architecture-specific code in libvirt > (not always successfully). We could omit the call, based on a QEMU > Capability derived from the presence of said flag. This would change the > libvirt-client side default to not report halted. A client can the still > request the value via a tbd libvirt flag. Which is what an s390-aware > management app would have to do... The problem I see here is that the current semantics of the "halted" field in QEMU is arch-specific, so either libvirt or upper layers will necessarily need arch-specific code if they want to support QEMU 2.11 or older.
On Fri, 2 Feb 2018 15:42:49 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Fri, Feb 02, 2018 at 05:19:34PM +0100, Viktor Mihajlovski wrote: > > On 02.02.2018 17:01, Luiz Capitulino wrote: > [...] > > > o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by > > > itself fixes the original performance issue > > We are normally trying to avoid architecture-specific code in libvirt > > (not always successfully). We could omit the call, based on a QEMU > > Capability derived from the presence of said flag. This would change the > > libvirt-client side default to not report halted. A client can the still > > request the value via a tbd libvirt flag. Which is what an s390-aware > > management app would have to do... > > The problem I see here is that the current semantics of the > "halted" field in QEMU is arch-specific, so either libvirt or > upper layers will necessarily need arch-specific code if they > want to support QEMU 2.11 or older. My understanding of this plan is: 1. Deprecate the "halted" field in query-cpus (that is, make it always return halted=false) 2. Add a new command, say query-cpu-state, which is arch dependent and is only available in archs that support sane "halted" semantics (I guess we can have per-arch QMP commands, right?) 3. Modify libvirt to use query-cpu-state if it's available, otherwise use query-cpus (in which case "halted" will be bogus, but that's a feature :) ) In essence, we're moving the arch-specific code from libvirt to qemu.
On Fri, Feb 02, 2018 at 01:50:33PM -0500, Luiz Capitulino wrote: > On Fri, 2 Feb 2018 15:42:49 -0200 > Eduardo Habkost <ehabkost@redhat.com> wrote: > > > On Fri, Feb 02, 2018 at 05:19:34PM +0100, Viktor Mihajlovski wrote: > > > On 02.02.2018 17:01, Luiz Capitulino wrote: > > [...] > > > > o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by > > > > itself fixes the original performance issue > > > We are normally trying to avoid architecture-specific code in libvirt > > > (not always successfully). We could omit the call, based on a QEMU > > > Capability derived from the presence of said flag. This would change the > > > libvirt-client side default to not report halted. A client can the still > > > request the value via a tbd libvirt flag. Which is what an s390-aware > > > management app would have to do... > > > > The problem I see here is that the current semantics of the > > "halted" field in QEMU is arch-specific, so either libvirt or > > upper layers will necessarily need arch-specific code if they > > want to support QEMU 2.11 or older. > > My understanding of this plan is: > > 1. Deprecate the "halted" field in query-cpus (that is, make it > always return halted=false) I don't think we really need to do this. If we do, this should be the last step (after libvirt is already using the new interfaces). > > 2. Add a new command, say query-cpu-state, which is arch dependent > and is only available in archs that support sane "halted" > semantics (I guess we can have per-arch QMP commands, right?) I don't see why we would make the new command arch-dependent. We need two new interfaces: 1) A lightweight version of query-cpus that won't interrupt the VCPUs. This can be a new command, or a new parameter to query-cpus. 2) A arch-independent way to query "CPU is online" state, as the existing "halted" field has confusing arch-specific semantics. > > 3. Modify libvirt to use query-cpu-state if it's available, > otherwise use query-cpus (in which case "halted" will be bogus, > but that's a feature :) ) > > In essence, we're moving the arch-specific code from libvirt to > qemu. Your plan above covers what will happen when using newer QEMU versions, but libvirt still needs to work sanely if running QEMU 2.11. My suggestion is that libvirt do not run query-cpus to ask for the "halted" field on any architecture except s390.
On Fri, 2 Feb 2018 18:09:12 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Fri, Feb 02, 2018 at 01:50:33PM -0500, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 15:42:49 -0200 > > Eduardo Habkost <ehabkost@redhat.com> wrote: > > > > > On Fri, Feb 02, 2018 at 05:19:34PM +0100, Viktor Mihajlovski wrote: > > > > On 02.02.2018 17:01, Luiz Capitulino wrote: > > > [...] > > > > > o Make qemuDomainRefreshVcpuHalted() s390-only in libvirt. This by > > > > > itself fixes the original performance issue > > > > We are normally trying to avoid architecture-specific code in libvirt > > > > (not always successfully). We could omit the call, based on a QEMU > > > > Capability derived from the presence of said flag. This would change the > > > > libvirt-client side default to not report halted. A client can the still > > > > request the value via a tbd libvirt flag. Which is what an s390-aware > > > > management app would have to do... > > > > > > The problem I see here is that the current semantics of the > > > "halted" field in QEMU is arch-specific, so either libvirt or > > > upper layers will necessarily need arch-specific code if they > > > want to support QEMU 2.11 or older. > > > > My understanding of this plan is: > > > > 1. Deprecate the "halted" field in query-cpus (that is, make it > > always return halted=false) > > I don't think we really need to do this. If we do, this should > be the last step (after libvirt is already using the new > interfaces). Yeah, I've just started taking a look on how to implement this plan I my first conclusion was to let current query-cpus alone. > > 2. Add a new command, say query-cpu-state, which is arch dependent > > and is only available in archs that support sane "halted" > > semantics (I guess we can have per-arch QMP commands, right?) > > I don't see why we would make the new command arch-dependent. We > need two new interfaces: > > 1) A lightweight version of query-cpus that won't interrupt the > VCPUs. This can be a new command, or a new parameter to > query-cpus. Exactly how I thought of doing it. I prefer a new command so that we untangle this from query-cpus. > 2) A arch-independent way to query "CPU is online" state, as the > existing "halted" field has confusing arch-specific semantics. Honest question: is it at all possible for QEMU to know a CPU is online? My impression is that the halted thing is the best we can do. If we need better than this, then libvirt should use the guest agent instead. Btw, I haven't checked all archs yet, but I'm under the impression the halted thing is confusing in x86 because of the in kernel irqchip. Otherwise this state is maintained be qemu itself. > > 3. Modify libvirt to use query-cpu-state if it's available, > > otherwise use query-cpus (in which case "halted" will be bogus, > > but that's a feature :) ) > > > > In essence, we're moving the arch-specific code from libvirt to > > qemu. > > Your plan above covers what will happen when using newer QEMU > versions, but libvirt still needs to work sanely if running QEMU > 2.11. My suggestion is that libvirt do not run query-cpus to ask > for the "halted" field on any architecture except s390. My current plan is to ask libvirt to completely remove query-cpus usage, independent of the arch and use the new command instead.
On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: > On Fri, 2 Feb 2018 18:09:12 -0200 > Eduardo Habkost <ehabkost@redhat.com> wrote: [...] > > Your plan above covers what will happen when using newer QEMU > > versions, but libvirt still needs to work sanely if running QEMU > > 2.11. My suggestion is that libvirt do not run query-cpus to ask > > for the "halted" field on any architecture except s390. > > My current plan is to ask libvirt to completely remove query-cpus > usage, independent of the arch and use the new command instead. This would be a regression for people running QEMU 2.11 on s390. (But maybe it would be an acceptable regression? Viktor, what do you think? Are there production releases of management systems that already rely on vcpu.<n>.halted?)
On Fri, 2 Feb 2018 18:41:44 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: > > On Fri, 2 Feb 2018 18:09:12 -0200 > > Eduardo Habkost <ehabkost@redhat.com> wrote: > [...] > > > Your plan above covers what will happen when using newer QEMU > > > versions, but libvirt still needs to work sanely if running QEMU > > > 2.11. My suggestion is that libvirt do not run query-cpus to ask > > > for the "halted" field on any architecture except s390. > > > > My current plan is to ask libvirt to completely remove query-cpus > > usage, independent of the arch and use the new command instead. > > This would be a regression for people running QEMU 2.11 on s390. libvirt could use query-cpus on s390 if the new command is not available. > > (But maybe it would be an acceptable regression? Viktor, what do > you think? Are there production releases of management systems > that already rely on vcpu.<n>.halted?) >
On Fri, 2 Feb 2018 16:49:54 -0500 Luiz Capitulino <lcapitulino@redhat.com> wrote: > On Fri, 2 Feb 2018 18:41:44 -0200 > Eduardo Habkost <ehabkost@redhat.com> wrote: > > > On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: > > > On Fri, 2 Feb 2018 18:09:12 -0200 > > > Eduardo Habkost <ehabkost@redhat.com> wrote: > > [...] > > > > Your plan above covers what will happen when using newer QEMU > > > > versions, but libvirt still needs to work sanely if running QEMU > > > > 2.11. My suggestion is that libvirt do not run query-cpus to ask > > > > for the "halted" field on any architecture except s390. > > > > > > My current plan is to ask libvirt to completely remove query-cpus > > > usage, independent of the arch and use the new command instead. > > > > This would be a regression for people running QEMU 2.11 on s390. > > libvirt could use query-cpus on s390 if the new command is not > available. Btw, even on s390 query-cpus runs run_on_cpu(), which is what causes vCPUs to go to user-space. So, s390 should suffer the same performance problem. Pick your regression. > > > > > (But maybe it would be an acceptable regression? Viktor, what do > > you think? Are there production releases of management systems > > that already rely on vcpu.<n>.halted?) > > >
On 02.02.2018 21:41, Eduardo Habkost wrote: > On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: >> On Fri, 2 Feb 2018 18:09:12 -0200 >> Eduardo Habkost <ehabkost@redhat.com> wrote: > [...] >>> Your plan above covers what will happen when using newer QEMU >>> versions, but libvirt still needs to work sanely if running QEMU >>> 2.11. My suggestion is that libvirt do not run query-cpus to ask >>> for the "halted" field on any architecture except s390. >> >> My current plan is to ask libvirt to completely remove query-cpus >> usage, independent of the arch and use the new command instead. > > This would be a regression for people running QEMU 2.11 on s390. > > (But maybe it would be an acceptable regression? Viktor, what do > you think? Are there production releases of management systems > that already rely on vcpu.<n>.halted?) > Unfortunately, there's code out there looking at vcpu.<n>.halted. I've informed the product team about the issue. If we drop/deprecate vcpu.<n>.halted from the domain statistics, this should be done for all arches, if there's a replacement mechanism (i.e. new VCPU states). As a stop-gap measure we can make the call arch-dependent until the new stuff is in place. BTW: libvirt cannot eliminate query-cpus entirely because initial CPU topology and changes due to hotplug must be queried.
On Mon, Feb 05, 2018 at 02:43:15PM +0100, Viktor Mihajlovski wrote: > On 02.02.2018 21:41, Eduardo Habkost wrote: > > On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: > >> On Fri, 2 Feb 2018 18:09:12 -0200 > >> Eduardo Habkost <ehabkost@redhat.com> wrote: > > [...] > >>> Your plan above covers what will happen when using newer QEMU > >>> versions, but libvirt still needs to work sanely if running QEMU > >>> 2.11. My suggestion is that libvirt do not run query-cpus to ask > >>> for the "halted" field on any architecture except s390. > >> > >> My current plan is to ask libvirt to completely remove query-cpus > >> usage, independent of the arch and use the new command instead. > > > > This would be a regression for people running QEMU 2.11 on s390. > > > > (But maybe it would be an acceptable regression? Viktor, what do > > you think? Are there production releases of management systems > > that already rely on vcpu.<n>.halted?) > > > Unfortunately, there's code out there looking at vcpu.<n>.halted. I've > informed the product team about the issue. > > If we drop/deprecate vcpu.<n>.halted from the domain statistics, this > should be done for all arches, if there's a replacement mechanism (i.e. > new VCPU states). As a stop-gap measure we can make the call > arch-dependent until the new stuff is in place. Yes, I think libvirt should just restrict this 'halted' feature reporting to s390 only, since the other archs have different semantics for this item, and the s390 semantics are the ones we want. If we figure out a better solution for gettign the data on s390 then we can change that to not use query-cpus at all, and possibly extend it to other archs too. > BTW: libvirt cannot eliminate query-cpus entirely because initial CPU > topology and changes due to hotplug must be queried. At that point, libvirt has not started VCPUS, so we are ok wrt the performance implications. Regards, Daniel
On Mon, 5 Feb 2018 13:47:27 +0000 Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Feb 05, 2018 at 02:43:15PM +0100, Viktor Mihajlovski wrote: > > On 02.02.2018 21:41, Eduardo Habkost wrote: > > > On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: > > >> On Fri, 2 Feb 2018 18:09:12 -0200 > > >> Eduardo Habkost <ehabkost@redhat.com> wrote: > > > [...] > > >>> Your plan above covers what will happen when using newer QEMU > > >>> versions, but libvirt still needs to work sanely if running QEMU > > >>> 2.11. My suggestion is that libvirt do not run query-cpus to ask > > >>> for the "halted" field on any architecture except s390. > > >> > > >> My current plan is to ask libvirt to completely remove query-cpus > > >> usage, independent of the arch and use the new command instead. > > > > > > This would be a regression for people running QEMU 2.11 on s390. > > > > > > (But maybe it would be an acceptable regression? Viktor, what do > > > you think? Are there production releases of management systems > > > that already rely on vcpu.<n>.halted?) > > > > > Unfortunately, there's code out there looking at vcpu.<n>.halted. I've > > informed the product team about the issue. > > > > If we drop/deprecate vcpu.<n>.halted from the domain statistics, this > > should be done for all arches, if there's a replacement mechanism (i.e. > > new VCPU states). As a stop-gap measure we can make the call > > arch-dependent until the new stuff is in place. > > Yes, I think libvirt should just restrict this 'halted' feature reporting > to s390 only, since the other archs have different semantics for this > item, and the s390 semantics are the ones we want. From this whole discussion, there's only one thing that I still don't understand (in a very honest way): what makes s390 halted semantics different? By quickly looking at the code, it seems to be very like the x86 one when in kernel irqchip is not used: if a guest vCPU executes HLT, the vCPU exits to userspace and qemu will put the vCPU thread to sleep. This is the semantics I'd expect for HLT, and maybe for all archs. What makes x86 different, is when the in kernel irqchip is used (which should be the default with libvirt). In this case, the vCPU thread avoids exiting to user-space. So, qemu doesn't know the vCPU halted. That's only one of the reasons why query-cpus forces vCPUs to user-space. But there are other reasons, and that's why even on s390 query-cpus will also force vCPUs to user-space, which means s390 has the same perf issue but maybe this hasn't been detected yet. For the immediate term, I still think we should have a query-cpus replacement that doesn't cause vCPUs to go to userspace. I'll work this this week. However, IMHO, what we really want is to add an API to the guest agent to export the CPU online bit from the guest userspace sysfs. This will give the ultimate semantics and move us away from this halted mess.
On 05.02.2018 16:37, Luiz Capitulino wrote: > On Mon, 5 Feb 2018 13:47:27 +0000 > Daniel P. Berrangé <berrange@redhat.com> wrote: > >> On Mon, Feb 05, 2018 at 02:43:15PM +0100, Viktor Mihajlovski wrote: >>> On 02.02.2018 21:41, Eduardo Habkost wrote: >>>> On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: >>>>> On Fri, 2 Feb 2018 18:09:12 -0200 >>>>> Eduardo Habkost <ehabkost@redhat.com> wrote: >>>> [...] >>>>>> Your plan above covers what will happen when using newer QEMU >>>>>> versions, but libvirt still needs to work sanely if running QEMU >>>>>> 2.11. My suggestion is that libvirt do not run query-cpus to ask >>>>>> for the "halted" field on any architecture except s390. >>>>> >>>>> My current plan is to ask libvirt to completely remove query-cpus >>>>> usage, independent of the arch and use the new command instead. >>>> >>>> This would be a regression for people running QEMU 2.11 on s390. >>>> >>>> (But maybe it would be an acceptable regression? Viktor, what do >>>> you think? Are there production releases of management systems >>>> that already rely on vcpu.<n>.halted?) >>>> >>> Unfortunately, there's code out there looking at vcpu.<n>.halted. I've >>> informed the product team about the issue. >>> >>> If we drop/deprecate vcpu.<n>.halted from the domain statistics, this >>> should be done for all arches, if there's a replacement mechanism (i.e. >>> new VCPU states). As a stop-gap measure we can make the call >>> arch-dependent until the new stuff is in place. >> >> Yes, I think libvirt should just restrict this 'halted' feature reporting >> to s390 only, since the other archs have different semantics for this >> item, and the s390 semantics are the ones we want. > > From this whole discussion, there's only one thing that I still don't > understand (in a very honest way): what makes s390 halted semantics > different?One problem is that using the halted property to indicate that the CPU has assumed the architected disabled wait state may not have been the wisest decision (my fault). If the CPU enters disabled wait, it will stay inactive until it is explicitly restarted which is different on x86. > > By quickly looking at the code, it seems to be very like the x86 one > when in kernel irqchip is not used: if a guest vCPU executes HLT, the > vCPU exits to userspace and qemu will put the vCPU thread to sleep. > This is the semantics I'd expect for HLT, and maybe for all archs.> > What makes x86 different, is when the in kernel irqchip is used (which > should be the default with libvirt). In this case, the vCPU thread avoids > exiting to user-space. So, qemu doesn't know the vCPU halted. > > That's only one of the reasons why query-cpus forces vCPUs to user-space. > But there are other reasons, and that's why even on s390 query-cpus > will also force vCPUs to user-space, which means s390 has the same perf > issue but maybe this hasn't been detected yet. > > For the immediate term, I still think we should have a query-cpus > replacement that doesn't cause vCPUs to go to userspace. I'll work this > this week. FWIW: I currently exploring an extension to query-cpus to report s390-specific information, allowing to ditch halted in the long run. Further, I'm considering a new QAPI event along the lines of "CPU info has changed" allowing QEMU to announce low-frequency changes of CPU state (as is the case for s390) and finally wire up a handler in libvirt to update a tbd. property (!= halted). > > However, IMHO, what we really want is to add an API to the guest agent > to export the CPU online bit from the guest userspace sysfs. This will > give the ultimate semantics and move us away from this halted mess. >
On Mon, 5 Feb 2018 17:10:18 +0100 Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > On 05.02.2018 16:37, Luiz Capitulino wrote: > > On Mon, 5 Feb 2018 13:47:27 +0000 > > Daniel P. Berrangé <berrange@redhat.com> wrote: > > > >> On Mon, Feb 05, 2018 at 02:43:15PM +0100, Viktor Mihajlovski wrote: > >>> On 02.02.2018 21:41, Eduardo Habkost wrote: > >>>> On Fri, Feb 02, 2018 at 03:19:45PM -0500, Luiz Capitulino wrote: > >>>>> On Fri, 2 Feb 2018 18:09:12 -0200 > >>>>> Eduardo Habkost <ehabkost@redhat.com> wrote: > >>>> [...] > >>>>>> Your plan above covers what will happen when using newer QEMU > >>>>>> versions, but libvirt still needs to work sanely if running QEMU > >>>>>> 2.11. My suggestion is that libvirt do not run query-cpus to ask > >>>>>> for the "halted" field on any architecture except s390. > >>>>> > >>>>> My current plan is to ask libvirt to completely remove query-cpus > >>>>> usage, independent of the arch and use the new command instead. > >>>> > >>>> This would be a regression for people running QEMU 2.11 on s390. > >>>> > >>>> (But maybe it would be an acceptable regression? Viktor, what do > >>>> you think? Are there production releases of management systems > >>>> that already rely on vcpu.<n>.halted?) > >>>> > >>> Unfortunately, there's code out there looking at vcpu.<n>.halted. I've > >>> informed the product team about the issue. > >>> > >>> If we drop/deprecate vcpu.<n>.halted from the domain statistics, this > >>> should be done for all arches, if there's a replacement mechanism (i.e. > >>> new VCPU states). As a stop-gap measure we can make the call > >>> arch-dependent until the new stuff is in place. > >> > >> Yes, I think libvirt should just restrict this 'halted' feature reporting > >> to s390 only, since the other archs have different semantics for this > >> item, and the s390 semantics are the ones we want. > > > > From this whole discussion, there's only one thing that I still don't > > understand (in a very honest way): what makes s390 halted semantics > > different?One problem is that using the halted property to indicate that the CPU > has assumed the architected disabled wait state may not have been the > wisest decision (my fault). If the CPU enters disabled wait, it will > stay inactive until it is explicitly restarted which is different on x86. Ah, OK. So, s390 does indeed have different semantics. > > By quickly looking at the code, it seems to be very like the x86 one > > when in kernel irqchip is not used: if a guest vCPU executes HLT, the > > vCPU exits to userspace and qemu will put the vCPU thread to sleep. > > This is the semantics I'd expect for HLT, and maybe for all archs.> > > What makes x86 different, is when the in kernel irqchip is used (which > > should be the default with libvirt). In this case, the vCPU thread avoids > > exiting to user-space. So, qemu doesn't know the vCPU halted. > > > > That's only one of the reasons why query-cpus forces vCPUs to user-space. > > But there are other reasons, and that's why even on s390 query-cpus > > will also force vCPUs to user-space, which means s390 has the same perf > > issue but maybe this hasn't been detected yet. > > > > For the immediate term, I still think we should have a query-cpus > > replacement that doesn't cause vCPUs to go to userspace. I'll work this > > this week. > FWIW: I currently exploring an extension to query-cpus to report > s390-specific information, allowing to ditch halted in the long run. > Further, I'm considering a new QAPI event along the lines of "CPU info > has changed" allowing QEMU to announce low-frequency changes of CPU > state (as is the case for s390) and finally wire up a handler in libvirt > to update a tbd. property (!= halted). I very much prefer adding a replacement for query-cpus, which works for all archs and which doesn't have any performance impact. > > > > However, IMHO, what we really want is to add an API to the guest agent > > to export the CPU online bit from the guest userspace sysfs. This will > > give the ultimate semantics and move us away from this halted mess. > > >
On Mon, Feb 05, 2018 at 10:37:01AM -0500, Luiz Capitulino wrote: [...] > However, IMHO, what we really want is to add an API to the guest agent > to export the CPU online bit from the guest userspace sysfs. This will > give the ultimate semantics and move us away from this halted mess. Can't this be detected on the host side without guest agent? Why do we need the guest to tell us what's the state of the virtual hardware we're emulating?
On Mon, 5 Feb 2018 20:50:20 -0200 Eduardo Habkost <ehabkost@redhat.com> wrote: > On Mon, Feb 05, 2018 at 10:37:01AM -0500, Luiz Capitulino wrote: > [...] > > However, IMHO, what we really want is to add an API to the guest agent > > to export the CPU online bit from the guest userspace sysfs. This will > > give the ultimate semantics and move us away from this halted mess. > > Can't this be detected on the host side without guest agent? Why > do we need the guest to tell us what's the state of the virtual > hardware we're emulating? I have no idea.
On 01.02.2018 21:26, Eduardo Habkost wrote: > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: >> 2018-02-01 12:54-0500, Luiz Capitulino: >>> >>> Libvirt needs to know when a vCPU is halted. To get this information, >> >> I don't see why upper level management should care about that, a single >> bit about halted state that can be incorrect at the time it is processed >> seems of very limited use. > > I don't see why, either. > > I'm CCing libvir-list and the people involved in the code that > added halt state to libvirt domain statistics. > I've posted a libvirt patch to disable query-cpus in the domain stats collection for all architectures other than s390. This should alleviate the problem until we have a better method to obtain arch-specific cpu state.
On Tue, 6 Feb 2018 11:29:46 +0100 Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> wrote: > On 01.02.2018 21:26, Eduardo Habkost wrote: > > On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > >> 2018-02-01 12:54-0500, Luiz Capitulino: > >>> > >>> Libvirt needs to know when a vCPU is halted. To get this information, > >> > >> I don't see why upper level management should care about that, a single > >> bit about halted state that can be incorrect at the time it is processed > >> seems of very limited use. > > > > I don't see why, either. > > > > I'm CCing libvir-list and the people involved in the code that > > added halt state to libvirt domain statistics. > > > I've posted a libvirt patch to disable query-cpus in the domain stats > collection for all architectures other than s390. This should alleviate > the problem until we have a better method to obtain arch-specific cpu state. This is cool, thanks a lot. Btw, I'll follow up this week with a replacement for qemu-cpus in QEMU. Then we can discuss again the best long term strategy for this issue.
diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c index c19c7ede9bd6..056dd1c787bc 100644 --- a/arch/x86/kvm/debugfs.c +++ b/arch/x86/kvm/debugfs.c @@ -15,6 +15,15 @@ bool kvm_arch_has_vcpu_debugfs(void) return true; } +static int vcpu_get_halted(void *data, u64 *val) +{ + struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data; + *val = vcpu->halted; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(vcpu_halted_fops, vcpu_get_halted, NULL, "%lld\n"); + static int vcpu_get_tsc_offset(void *data, u64 *val) { struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data; @@ -51,6 +60,12 @@ int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) if (!ret) return -ENOMEM; + ret = debugfs_create_file("halted", 0444, + vcpu->debugfs_dentry, + vcpu, &vcpu_halted_fops); + if (!ret) + return -ENOMEM; + if (kvm_has_tsc_control) { ret = debugfs_create_file("tsc-scaling-ratio", 0444, vcpu->debugfs_dentry, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c13cd14c4780..9841841d186b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void) int kvm_vcpu_halt(struct kvm_vcpu *vcpu) { + kvm_vcpu_set_halted(vcpu); ++vcpu->stat.halt_exits; if (lapic_in_kernel(vcpu)) { vcpu->arch.mp_state = KVM_MP_STATE_HALTED; @@ -7204,6 +7205,7 @@ static int vcpu_run(struct kvm_vcpu *vcpu) for (;;) { if (kvm_vcpu_running(vcpu)) { + kvm_vcpu_set_running(vcpu); r = vcpu_enter_guest(vcpu); } else { r = vcpu_block(kvm, vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ac0062b74aed..430a4d06b0fb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -272,10 +272,21 @@ struct kvm_vcpu { } spin_loop; #endif bool preempted; + bool halted; struct kvm_vcpu_arch arch; struct dentry *debugfs_dentry; }; +static inline void kvm_vcpu_set_running(struct kvm_vcpu *vcpu) +{ + vcpu->halted = 0; +} + +static inline void kvm_vcpu_set_halted(struct kvm_vcpu *vcpu) +{ + vcpu->halted = 1; +} + static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) { /*
Libvirt needs to know when a vCPU is halted. To get this information, libvirt has started using the query-cpus command from QEMU. However, if in kernel irqchip is in use, query-cpus will force all vCPUs to user-space since they have to issue the KVM_GET_MP_STATE ioctl. This has catastrophic implications to low-latency workloads like KVM-RT and zero packet loss with DPDK. To make matters worse, there's an OpenStack service called ceilometer that causes libvirt to issue query-cpus every few minutes. The solution proposed in this patch is to export the vCPU halted state in the already existing vcpu directory in sysfs. This way, libvirt can read the vCPU halted state from sysfs and avoid using the query-cpus command. This solution seems to be sufficient for libvirt needs, but it has the following cons: * vcpu information in sysfs lives in a debug directory, so libvirt would be basing its API on debug info * Currently, only x86 supports the vcpu dir in sysfs, so we'd have to expand this to other archs (should be doable) If we agree that this solution is feasible, I'll work on extending the vcpu debug information to other archs for my next posting. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> --- arch/x86/kvm/debugfs.c | 15 +++++++++++++++ arch/x86/kvm/x86.c | 2 ++ include/linux/kvm_host.h | 11 +++++++++++ 3 files changed, 28 insertions(+)