Message ID | 20170821094654.xgzppysitxt3i6sz@MacBook-Pro-de-Roger.local (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>> Attached ZIP file contains hypervisor and qemu log. > Thanks, so the guest is indeed unmasking the MSIX table entries before > MSIX is enabled, and QEMU only registers the entries with Xen once MSIX > is enabled: > The patch I've sent earlier (which I'm also attaching to this email) > implements the first solution for QEMU. > Can you please give it a try? Results are inconsistent. On the first run it seemed OK (Windows had a problem of its own). On a later run, Windows again hang during boot. qemu logs are different on both runs but I was not able to separate different runs in xl-dmesg. Regards Andreas
>>> On 21.08.17 at 11:46, <roger.pau@citrix.com> wrote: > Xen never detects the MSIX table entry unmask because it happens > before the MSIX is bound to the guest, so the Xen msixtbl handlers are > not aware of this memory region. > > The two possible solutions I see to this are: > > - Make QEMU setup the vectors when the table entries are unmasked, > even if MSIX is not enabled. > - Provide an hypercall so QEMU can unmask MSIX vectors on behalf of > the guest. This would be used to unmask the entries if MSIX is > enabled with table entries already unmasked. Neither sounds right. As long as MSI-X is disabled (or the maskall bit set), the contents of the table entries is meaningless. It is not correct to assign any meaning to them in that state. And qemu should not be unmasking the entries on behalf of the guest either. It ought to be possible for Xen to know the state of the mask bits at the time of binding the interrupts. After all, with a new hypercall added like you propose it, there would still be a window in time where neither setting (masked or unmasked) would be correct in Xen's internal recording of state. Quite possibly the only solution is for qemu to communicate the intended state of the mask bit while binding the interrupt. Jan
>> - Make QEMU setup the vectors when the table entries are unmasked, >> even if MSIX is not enabled. >> - Provide an hypercall so QEMU can unmask MSIX vectors on behalf of >> the guest. This would be used to unmask the entries if MSIX is >> enabled with table entries already unmasked. > Neither sounds right. As long as MSI-X is disabled (or the maskall > bit set), the contents of the table entries is meaningless. It is not > correct to assign any meaning to them in that state. And qemu > should not be unmasking the entries on behalf of the guest either. > It ought to be possible for Xen to know the state of the mask bits > at the time of binding the interrupts. After all, with a new hypercall > added like you propose it, there would still be a window in time > where neither setting (masked or unmasked) would be correct in > Xen's internal recording of state. Quite possibly the only solution > is for qemu to communicate the intended state of the mask bit > while binding the interrupt. Hello Jan + Roger, as someone who is not familiar with MSI-X stuff, I still have one obvious question: Why was all of this not a problem before the commit I bisected down? Everything did work before and to be honest there is a "business" side to my question: Since that patch it has actually been broken for at least 3 major Xen releaes (4.6-4.8). Does that make any sense? For the reputation of Xen? Regards Andreas
>>> On 21.08.17 at 14:32, <hfp@posteo.de> wrote: >> > - Make QEMU setup the vectors when the table entries are unmasked, >>> even if MSIX is not enabled. >>> - Provide an hypercall so QEMU can unmask MSIX vectors on behalf of >>> the guest. This would be used to unmask the entries if MSIX is >>> enabled with table entries already unmasked. >> Neither sounds right. As long as MSI-X is disabled (or the maskall >> bit set), the contents of the table entries is meaningless. It is not >> correct to assign any meaning to them in that state. And qemu >> should not be unmasking the entries on behalf of the guest either. >> It ought to be possible for Xen to know the state of the mask bits >> at the time of binding the interrupts. After all, with a new hypercall >> added like you propose it, there would still be a window in time >> where neither setting (masked or unmasked) would be correct in >> Xen's internal recording of state. Quite possibly the only solution >> is for qemu to communicate the intended state of the mask bit >> while binding the interrupt. > > as someone who is not familiar with MSI-X stuff, I still have one obvious > question: Why was all of this not a problem before the commit I bisected > down? Everything did work before and to be honest there is a "business" > side to my question: Since that patch it has actually been broken for at > least 3 major Xen releaes (4.6-4.8). Does that make any sense? For the > reputation of Xen? For the reputation of Xen this is not nice, I agree. But this being an open source project, you would have been free to contribute a fix in all of this time. In no case is it, imo, appropriate to demand an immediate solution here - if there's a business aspect for you and you don't feel capable of analyzing and fixing the issue yourself, you always have the option of paying someone who is. The fact that things had worked before that change does _not_ mean the change was bad. It's is far more likely for things to have worked out of pure luck. Jan
On Mon, Aug 21, 2017 at 06:22:05AM -0600, Jan Beulich wrote: > >>> On 21.08.17 at 11:46, <roger.pau@citrix.com> wrote: > > Xen never detects the MSIX table entry unmask because it happens > > before the MSIX is bound to the guest, so the Xen msixtbl handlers are > > not aware of this memory region. > > > > The two possible solutions I see to this are: > > > > - Make QEMU setup the vectors when the table entries are unmasked, > > even if MSIX is not enabled. > > - Provide an hypercall so QEMU can unmask MSIX vectors on behalf of > > the guest. This would be used to unmask the entries if MSIX is > > enabled with table entries already unmasked. > > Neither sounds right. As long as MSI-X is disabled (or the maskall > bit set), the contents of the table entries is meaningless. It is not > correct to assign any meaning to them in that state. Right, they only become relevant when MSI-X is enabled, the maskall bit is unset and the entry control register mask bit is also unset. > And qemu > should not be unmasking the entries on behalf of the guest either. > It ought to be possible for Xen to know the state of the mask bits > at the time of binding the interrupts. AFAICT QEMU will only bound the interrupts when they are unmasked, so the semantics of XEN_DOMCTL_bind_pt_irq could be changed so that MSI interrupts are unmasked when bound, but that will change the current behavior. Another option is to (ab)use the msi.gflags field to add another flag in order to signal Xen whether the interrupt should be unmasked. This is in line with what you suggest below. Roger.
>>> On 21.08.17 at 16:49, <roger.pau@citrix.com> wrote: > Another option is to (ab)use the msi.gflags field to add another flag > in order to signal Xen whether the interrupt should be unmasked. This > is in line with what you suggest below. From a brief look it looks like this would be doable, but the way these flags are being communicated is rather ugly (the values used here aren't part of the public interface, and hence it wasn't immediately clear whether using one of the unused bits would be an option, but it looks like it is). Jan
On Mon, Aug 21, 2017 at 09:14:45AM -0600, Jan Beulich wrote: > >>> On 21.08.17 at 16:49, <roger.pau@citrix.com> wrote: > > Another option is to (ab)use the msi.gflags field to add another flag > > in order to signal Xen whether the interrupt should be unmasked. This > > is in line with what you suggest below. > > From a brief look it looks like this would be doable, but the way these > flags are being communicated is rather ugly (the values used here > aren't part of the public interface, and hence it wasn't immediately > clear whether using one of the unused bits would be an option, but > it looks like it is). Yes, it's not pretty... Last used bit is 15, hence bit 16 could be used to signal to Xen whether the interrupt should be unmasked after binding. I have a half-drafted patch, will finish it now. Thanks, Roger.
diff --git a/hw/xen/xen_pt_msi.c b/hw/xen/xen_pt_msi.c index ff9a79f5d2..dfb8d64654 100644 --- a/hw/xen/xen_pt_msi.c +++ b/hw/xen/xen_pt_msi.c @@ -451,8 +451,12 @@ static void pci_msix_write(void *opaque, hwaddr addr, } entry->updated = true; - } else if (msix->enabled && entry->updated && - !(val & PCI_MSIX_ENTRY_CTRL_MASKBIT)) { + } else if (entry->updated && !(val & PCI_MSIX_ENTRY_CTRL_MASKBIT)) { + /* + * NB: always register the entries with Xen when the write to the MSIX + * table happens, or else Xen won't be able to correctly snoop the + * entry control register write, and will fails to unmask the vector. + */ const volatile uint32_t *vec_ctrl; /*