Message ID | 20200218144415.94722-1-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] target/i386: filter out VMX_PIN_BASED_POSTED_INTR when enabling SynIC | expand |
On 18/02/20 15:44, Vitaly Kuznetsov wrote: > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > RFC: This is somewhat similar to eVMCS breakage and it is likely possible > to fix this in KVM. I decided to try QEMU first as this is a single > control and unlike eVMCS we don't need to keep a list of things to disable. I think you should disable "virtual-interrupt delivery" instead (which in turn requires "process posted interrupts" to be zero). That is the one that is incompatible with AutoEOI interrupts. The ugly part about fixing this in QEMU is that in theory it would be still possible to emulate virtual interrupt delivery and posted interrupts, because they operate on a completely disjoint APIC configuration than the host's. I'm not sure we want to go there though, so I'm thinking that again a KVM implementation is better. It acknowledges that this is just a limitation (workaround for a bug) in KVM. Paolo
Patchew URL: https://patchew.org/QEMU/20200218144415.94722-1-vkuznets@redhat.com/ Hi, This series failed the docker-quick@centos7 build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash make docker-image-centos7 V=1 NETWORK=1 time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === -........................................................................................... +..................................E........................................................ +====================================================================== +ERROR: test_pause (__main__.TestSingleBlockdev) +---------------------------------------------------------------------- +Traceback (most recent call last): + File "041", line 106, in test_pause --- Ran 91 tests -OK +FAILED (errors=1) TEST iotest-qcow2: 042 TEST iotest-qcow2: 043 TEST iotest-qcow2: 046 --- TEST iotest-qcow2: 283 Failures: 041 Failed 1 of 116 iotests make: *** [check-tests/check-block.sh] Error 1 Traceback (most recent call last): File "./tests/docker/docker.py", line 664, in <module> sys.exit(main()) --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=d102c1c213f9486daa4ddb9ff686404b', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-wjtutoo7/src/docker-src.2020-02-18-11.43.41.11223:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=d102c1c213f9486daa4ddb9ff686404b make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-wjtutoo7/src' make: *** [docker-run-test-quick@centos7] Error 2 real 12m38.662s user 0m8.825s The full log is available at http://patchew.org/logs/20200218144415.94722-1-vkuznets@redhat.com/testing.docker-quick@centos7/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Paolo Bonzini <pbonzini@redhat.com> writes: > On 18/02/20 15:44, Vitaly Kuznetsov wrote: >> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >> --- >> RFC: This is somewhat similar to eVMCS breakage and it is likely possible >> to fix this in KVM. I decided to try QEMU first as this is a single >> control and unlike eVMCS we don't need to keep a list of things to disable. > > I think you should disable "virtual-interrupt delivery" instead (which > in turn requires "process posted interrupts" to be zero). That is the > one that is incompatible with AutoEOI interrupts. I'm fighting the symptoms, not the cause :-) My understanding is that when SynIC is enabled for CPU0 KVM does kvm_vcpu_update_apicv() vmx_refresh_apicv_exec_ctrl() pin_controls_set() for *all* vCPUs (KVM_REQ_APICV_UPDATE). I'm not sure why SECONDARY_EXEC_APIC_REGISTER_VIRT/SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY are not causing problems and only PIN_BASED_POSTED_INTR does as we clear them all (not very important atm). > > The ugly part about fixing this in QEMU is that in theory it would be > still possible to emulate virtual interrupt delivery and posted > interrupts, because they operate on a completely disjoint APIC > configuration than the host's. I'm not sure we want to go there though, > so I'm thinking that again a KVM implementation is better. It > acknowledges that this is just a limitation (workaround for a bug) in KVM. The KVM implementation will differ from what we've done to fix eVMCS. We will either need to keep the controls on (and additionally check kvm_vcpu_apicv_active() if guest tries to enable them) and again filter VMX MSR reads from the guest or do the filtering on MSR write from userspace (filter out the unsupported controls and not fail). Actually, I'm starting to think it would've been easier to just filter all VMX MSRs on KVM_SET_MSRS leaving only the supported controls and not fail the operation. That way we would've fixed both eVMCS and SynIC issues in a consistent way shifting the responsibility towards userspace (document that VMX MSRs are 'special' and enabling certain features may result in changes; if userspace wants to see the actual state it may issue KVM_GET_MSRS any time) May not be the worst solution after all...
no-reply@patchew.org writes: > Patchew URL: https://patchew.org/QEMU/20200218144415.94722-1-vkuznets@redhat.com/ > > > > Hi, > > This series failed the docker-quick@centos7 build test. Please find the testing commands and > their output below. If you have Docker installed, you can probably reproduce it > locally. Hm, honestly I don't see how this can be related to my patch: --- /tmp/qemu-test/src/tests/qemu-iotests/041.out 2020-02-18 14:42:30.000000000 +0000 +++ /tmp/qemu-test/build/tests/qemu-iotests/041.out.bad 2020-02-18 16:50:07.383069241 +0000 @@ -1,5 +1,29 @@ -........................................................................................... +..................................E........................................................ +====================================================================== +ERROR: test_pause (__main__.TestSingleBlockdev) +---------------------------------------------------------------------- ... +Exception: Timeout waiting for job to pause + something else is broken?
On 18/02/20 18:08, Vitaly Kuznetsov wrote: > Paolo Bonzini <pbonzini@redhat.com> writes: > >> On 18/02/20 15:44, Vitaly Kuznetsov wrote: >>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >>> --- >>> RFC: This is somewhat similar to eVMCS breakage and it is likely possible >>> to fix this in KVM. I decided to try QEMU first as this is a single >>> control and unlike eVMCS we don't need to keep a list of things to disable. >> >> I think you should disable "virtual-interrupt delivery" instead (which >> in turn requires "process posted interrupts" to be zero). That is the >> one that is incompatible with AutoEOI interrupts. > > I'm fighting the symptoms, not the cause :-) My understanding is that > when SynIC is enabled for CPU0 KVM does > > kvm_vcpu_update_apicv() > vmx_refresh_apicv_exec_ctrl() > pin_controls_set() > > for *all* vCPUs (KVM_REQ_APICV_UPDATE). I'm not sure why > SECONDARY_EXEC_APIC_REGISTER_VIRT/SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY > are not causing problems and only PIN_BASED_POSTED_INTR does as we clear > them all (not very important atm). Let's take a step back, what is the symptom, i.e. how does it fail? Because thinking more about it, since we have separate VMCS we can set PIN_BASED_POSTED_INTR and SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY just fine in the vmcs02. The important part is to unconditionally call vmx_deliver_nested_posted_interrupt. Something like if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) { kvm_lapic_set_irr(vector, apic); kvm_make_request(KVM_REQ_EVENT, vcpu); kvm_vcpu_kick(vcpu); } and in vmx_deliver_posted_interrupt r = vmx_deliver_nested_posted_interrupt(vcpu, vector); if (!r) return 0; if (!vcpu->arch.apicv_active) return -1; ... return 0; Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 18/02/20 18:08, Vitaly Kuznetsov wrote: >> Paolo Bonzini <pbonzini@redhat.com> writes: >> >>> On 18/02/20 15:44, Vitaly Kuznetsov wrote: >>>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> >>>> --- >>>> RFC: This is somewhat similar to eVMCS breakage and it is likely possible >>>> to fix this in KVM. I decided to try QEMU first as this is a single >>>> control and unlike eVMCS we don't need to keep a list of things to disable. >>> >>> I think you should disable "virtual-interrupt delivery" instead (which >>> in turn requires "process posted interrupts" to be zero). That is the >>> one that is incompatible with AutoEOI interrupts. >> >> I'm fighting the symptoms, not the cause :-) My understanding is that >> when SynIC is enabled for CPU0 KVM does >> >> kvm_vcpu_update_apicv() >> vmx_refresh_apicv_exec_ctrl() >> pin_controls_set() >> >> for *all* vCPUs (KVM_REQ_APICV_UPDATE). I'm not sure why >> SECONDARY_EXEC_APIC_REGISTER_VIRT/SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY >> are not causing problems and only PIN_BASED_POSTED_INTR does as we clear >> them all (not very important atm). > > Let's take a step back, what is the symptom, i.e. how does it fail? I just do ~/qemu/x86_64-softmmu/qemu-system-x86_64 -machine q35,accel=kvm -cpu host,hv_vpindex,hv_synic -smp 2 -m 16384 -vnc :0 and get qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016 qemu-system-x86_64: /root/qemu/target/i386/kvm.c:2684: kvm_buf_set_msrs: Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. Aborted (it works with '-smp 1' or without 'hv_synic') > Because thinking more about it, since we have separate VMCS we can set > PIN_BASED_POSTED_INTR and SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY just fine > in the vmcs02. > The important part is to unconditionally call > vmx_deliver_nested_posted_interrupt. > > Something like > > if (kvm_x86_ops->deliver_posted_interrupt(vcpu, vector)) { > kvm_lapic_set_irr(vector, apic); > kvm_make_request(KVM_REQ_EVENT, vcpu); > kvm_vcpu_kick(vcpu); > } > > and in vmx_deliver_posted_interrupt > > r = vmx_deliver_nested_posted_interrupt(vcpu, vector); > if (!r) > return 0; > > if (!vcpu->arch.apicv_active) > return -1; > ... > return 0; Sound like a plan, let me try playing with it.
diff --git a/target/i386/kvm.c b/target/i386/kvm.c index 69eb43d796e6..6829b597fdbf 100644 --- a/target/i386/kvm.c +++ b/target/i386/kvm.c @@ -1366,6 +1366,7 @@ static Error *hv_no_nonarch_cs_mig_blocker; static int hyperv_init_vcpu(X86CPU *cpu) { CPUState *cs = CPU(cpu); + CPUX86State *env = &cpu->env; Error *local_err = NULL; int ret; @@ -1431,6 +1432,9 @@ static int hyperv_init_vcpu(X86CPU *cpu) return ret; } + /* When SynIC is enabled, APICv controls become unavailable */ + env->features[FEAT_VMX_PINBASED_CTLS] &= ~VMX_PIN_BASED_POSTED_INTR; + if (!cpu->hyperv_synic_kvm_only) { ret = hyperv_x86_synic_add(cpu); if (ret < 0) { @@ -1845,13 +1849,13 @@ int kvm_arch_init_vcpu(CPUState *cs) has_msr_tsc_aux = false; } - kvm_init_msrs(cpu); - r = hyperv_init_vcpu(cpu); if (r) { goto fail; } + kvm_init_msrs(cpu); + return 0; fail:
When a multi-vCPU guest is created with hv_synic, secondary vCPUs fail to initialize with qemu-system-x86_64: error: failed to set MSR 0x48d to 0xff00000016 This is caused by SynIC enablement on the boot CPU: when we do this KVM disables apicv for the whole guest so we can't set VMX_PIN_BASED_POSTED_INTR bit in MSR_IA32_VMX_TRUE_PINBASED_CTLS anymore. (see nested_vmx_setup_ctls_msrs() in KVM). This used to work before fine-grained VMX feature enablement because we were not setting VMX MSRs. Fix the issue by filtering out VMX_PIN_BASED_POSTED_INTR when enabling SynIC. We also need to re-order kvm_init_msrs() with hyperv_init_vcpu() so filtering on secondary CPUs happens before. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- RFC: This is somewhat similar to eVMCS breakage and it is likely possible to fix this in KVM. I decided to try QEMU first as this is a single control and unlike eVMCS we don't need to keep a list of things to disable. --- target/i386/kvm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)