Message ID | 1400468470-11262-5-git-send-email-gwshan@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: > The MSIx vector table lives in device memory, which may be cleared as > part of a backdoor device reset. This is the case on the IBM IPR HBA > when the BIST is run on the device. When assigned to a QEMU guest, > the guest driver does a pci_save_state(), issues a BIST, then does a > pci_restore_state(). The BIST clears the MSIx vector table, but due > to the way interrupts are configured the pci_restore_state() does not > restore the vector table as expected. Eventually this results in an > EEH error on Power platforms when the device attempts to signal an > interrupt with the zero'd table entry. > > Fix the problem by restoring the host cached MSI message prior to > enabling each vector. > > Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index 9dd49c9..553212f 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -16,6 +16,7 @@ > #include <linux/device.h> > #include <linux/interrupt.h> > #include <linux/eventfd.h> > +#include <linux/msi.h> > #include <linux/pci.h> > #include <linux/file.h> > #include <linux/poll.h> > @@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > return PTR_ERR(trigger); > } > > + /* > + * The MSIx vector table resides in device memory which may be cleared > + * via backdoor resets. We don't allow direct access to the vector > + * table so even if a userspace driver attempts to save/restore around > + * such a reset it would be unsuccessful. To avoid this, restore the > + * cached value of the message prior to enabling. > + */ > + if (msix) { > + struct msi_msg msg; > + > + get_cached_msi_msg(irq, &msg); > + write_msi_msg(irq, &msg); > + } I think this is pretty ugly. Drivers should not be writing to the MSI-X vector table, so I don't really want to export these internal implementation functions if we can avoid it. I chatted with Alex about this last week on IRC, trying to understand what's going on here, but I'm afraid I didn't get very far. I think I understand what happens when there's no virtualization involved. The driver enables MSI-X and writes the vector table via this path: pci_enable_msix msix_capability_init arch_setup_msi_irqs native_setup_msi_irqs # .setup_msi_irqs (on x86) setup_msi_irq write_msi_msg __write_msi_msg # write vector table When a device is reset, its MSI-X vector table is cleared. The type of reset (FLR, "backdoor", etc.) doesn't really matter. After a device reset, the driver would use this path to restore the vector table: pci_restore_state pci_restore_msi_state __pci_restore_msix_state arch_restore_msi_irqs default_restore_msi_irqs # .restore_msi_irqs (on x86) default_restore_msi_irq write_msi_msg __write_msi_msg # write vector table This rewrites the MSI-X vector table (it doesn't use any data that was saved by pci_save_state(), so it's not really a "restore" in that sense; it writes the vector table from scratch based on the data structures maintained by the MSI core). If the same driver is running in a qemu guest, it still calls pci_enable_msix() and pci_restore_state(), but apparently the restore path doesn't work. Alex mentioned that qemu virtualizes the vector table, so I assume it traps the writel() to the vector table when enabling MSI-X? And I assume qemu would also trap the writel() in the restore path, but it sounded like it ignores the write because we're writing the same data qemu believes to be there? I'd like to understand more details about how those writel()s performed by the guest kernel are handled. Alex mentioned that the vector table is inaccessible to the guest, and I see code in vfio_pci_bar_rw() that looks like it excludes the table area, so I assume that is involved somehow, but I don't know how to connect the dots. Obviously the enable path must be handled differently from the restore path somehow, because if the enable used vfio_pci_bar_rw(), that write would just be dropped, too, and it's not. > ret = request_irq(irq, vfio_msihandler, 0, > vdev->ctx[vector].name, trigger); > if (ret) { > -- > 1.8.3.2 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 30, 2014 at 04:12:32PM -0600, Bjorn Helgaas wrote: >On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: >> The MSIx vector table lives in device memory, which may be cleared as >> part of a backdoor device reset. This is the case on the IBM IPR HBA >> when the BIST is run on the device. When assigned to a QEMU guest, >> the guest driver does a pci_save_state(), issues a BIST, then does a >> pci_restore_state(). The BIST clears the MSIx vector table, but due >> to the way interrupts are configured the pci_restore_state() does not >> restore the vector table as expected. Eventually this results in an >> EEH error on Power platforms when the device attempts to signal an >> interrupt with the zero'd table entry. >> >> Fix the problem by restoring the host cached MSI message prior to >> enabling each vector. >> >> Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> >> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >> --- >> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >> index 9dd49c9..553212f 100644 >> --- a/drivers/vfio/pci/vfio_pci_intrs.c >> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >> @@ -16,6 +16,7 @@ >> #include <linux/device.h> >> #include <linux/interrupt.h> >> #include <linux/eventfd.h> >> +#include <linux/msi.h> >> #include <linux/pci.h> >> #include <linux/file.h> >> #include <linux/poll.h> >> @@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> return PTR_ERR(trigger); >> } >> >> + /* >> + * The MSIx vector table resides in device memory which may be cleared >> + * via backdoor resets. We don't allow direct access to the vector >> + * table so even if a userspace driver attempts to save/restore around >> + * such a reset it would be unsuccessful. To avoid this, restore the >> + * cached value of the message prior to enabling. >> + */ >> + if (msix) { >> + struct msi_msg msg; >> + >> + get_cached_msi_msg(irq, &msg); >> + write_msi_msg(irq, &msg); >> + } > >I think this is pretty ugly. Drivers should not be writing to the >MSI-X vector table, so I don't really want to export these internal >implementation functions if we can avoid it. > I agree that it's ugly and I need discuss with Alex about the potential solutions: fix the issue either from guest or qemu. - If the "reset" is special backdoor for some devices, the device driver on guest side should have something like: disable MSIx entries that have been enabled (updating MSIx entries maintained by QEMU), pci_save_state(), reset(), pci_restore_state(), enable MSIx entries (updating MSIx entries maintained by QEMU). Disadvantage of this way would be guest driver has to accomodate QEMU, which sounds bad. - In QEMU, we could have some quirk to trap when writting to registers for reset on basis of devices. From there, to clear the MSIx entries maintained by QEMU. It's similar thing to be applied when having FLR reset. We have to have separate quirk to accomodate every kind of devices. - Last one is what we had. However, it's really "hack". >I chatted with Alex about this last week on IRC, trying to understand >what's going on here, but I'm afraid I didn't get very far. > >I think I understand what happens when there's no virtualization >involved. The driver enables MSI-X and writes the vector table via >this path: > > pci_enable_msix > msix_capability_init > arch_setup_msi_irqs > native_setup_msi_irqs # .setup_msi_irqs (on x86) > setup_msi_irq > write_msi_msg > __write_msi_msg # write vector table > >When a device is reset, its MSI-X vector table is cleared. The type >of reset (FLR, "backdoor", etc.) doesn't really matter. > >After a device reset, the driver would use this path to restore the >vector table: > > pci_restore_state > pci_restore_msi_state > __pci_restore_msix_state > arch_restore_msi_irqs > default_restore_msi_irqs # .restore_msi_irqs (on x86) > default_restore_msi_irq > write_msi_msg > __write_msi_msg # write vector table > >This rewrites the MSI-X vector table (it doesn't use any data that was >saved by pci_save_state(), so it's not really a "restore" in that >sense; it writes the vector table from scratch based on the data >structures maintained by the MSI core). > >If the same driver is running in a qemu guest, it still calls >pci_enable_msix() and pci_restore_state(), but apparently the restore >path doesn't work. Alex mentioned that qemu virtualizes the vector >table, so I assume it traps the writel() to the vector table when >enabling MSI-X? And I assume qemu would also trap the writel() in the >restore path, but it sounded like it ignores the write because we're >writing the same data qemu believes to be there? > >I'd like to understand more details about how those writel()s >performed by the guest kernel are handled. Alex mentioned that the >vector table is inaccessible to the guest, and I see code in >vfio_pci_bar_rw() that looks like it excludes the table area, so I >assume that is involved somehow, but I don't know how to connect the >dots. Obviously the enable path must be handled differently from the >restore path somehow, because if the enable used vfio_pci_bar_rw(), >that write would just be dropped, too, and it's not. > The problem is basically the MSIx entries maintained in QEMU mismatched with those in hardware (host kernel), which is caused by backdoor "reset": - Guest driver enables MSIx entries. MSIx entries are marked as "enabled" in hardware, QEMU, guest. - Guest driver calls pci_save_state() and then issues backdoor reset. We lose everything in MSIx table in hardware. QEMU still maintains "enabled" MSIx entries. - Guest driver calls to pci_restore_state() and tries to enable MSIx entries. Writing to MSIx entries trapped in QEMU. QEMU won't update MSIx entries in hardware because the MSIx entries are marked as "enabled" in QEMU. Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 31, 2014 at 5:42 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote: > On Fri, May 30, 2014 at 04:12:32PM -0600, Bjorn Helgaas wrote: >>On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: >>> The MSIx vector table lives in device memory, which may be cleared as >>> part of a backdoor device reset. This is the case on the IBM IPR HBA >>> when the BIST is run on the device. When assigned to a QEMU guest, >>> the guest driver does a pci_save_state(), issues a BIST, then does a >>> pci_restore_state(). The BIST clears the MSIx vector table, but due >>> to the way interrupts are configured the pci_restore_state() does not >>> restore the vector table as expected. Eventually this results in an >>> EEH error on Power platforms when the device attempts to signal an >>> interrupt with the zero'd table entry. >>> >>> Fix the problem by restoring the host cached MSI message prior to >>> enabling each vector. >>> >>> Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >>> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> >>> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >>> --- >>> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ >>> 1 file changed, 15 insertions(+) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >>> index 9dd49c9..553212f 100644 >>> --- a/drivers/vfio/pci/vfio_pci_intrs.c >>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >>> @@ -16,6 +16,7 @@ >>> #include <linux/device.h> >>> #include <linux/interrupt.h> >>> #include <linux/eventfd.h> >>> +#include <linux/msi.h> >>> #include <linux/pci.h> >>> #include <linux/file.h> >>> #include <linux/poll.h> >>> @@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >>> return PTR_ERR(trigger); >>> } >>> >>> + /* >>> + * The MSIx vector table resides in device memory which may be cleared >>> + * via backdoor resets. We don't allow direct access to the vector >>> + * table so even if a userspace driver attempts to save/restore around >>> + * such a reset it would be unsuccessful. To avoid this, restore the >>> + * cached value of the message prior to enabling. >>> + */ >>> + if (msix) { >>> + struct msi_msg msg; >>> + >>> + get_cached_msi_msg(irq, &msg); >>> + write_msi_msg(irq, &msg); >>> + } >> >>I think this is pretty ugly. Drivers should not be writing to the >>MSI-X vector table, so I don't really want to export these internal >>implementation functions if we can avoid it. >> > > I agree that it's ugly and I need discuss with Alex about the potential > solutions: fix the issue either from guest or qemu. > > - If the "reset" is special backdoor for some devices, the device driver > on guest side should have something like: disable MSIx entries that have > been enabled (updating MSIx entries maintained by QEMU), pci_save_state(), > reset(), pci_restore_state(), enable MSIx entries (updating MSIx entries > maintained by QEMU). Disadvantage of this way would be guest driver has > to accomodate QEMU, which sounds bad. I agree, this sounds even worse. > - In QEMU, we could have some quirk to trap when writting to registers > for reset on basis of devices. From there, to clear the MSIx entries > maintained by QEMU. It's similar thing to be applied when having FLR > reset. We have to have separate quirk to accomodate every kind of devices. This also sounds bad. > - Last one is what we had. However, it's really "hack". > >>I chatted with Alex about this last week on IRC, trying to understand >>what's going on here, but I'm afraid I didn't get very far. >> >>I think I understand what happens when there's no virtualization >>involved. The driver enables MSI-X and writes the vector table via >>this path: >> >> pci_enable_msix >> msix_capability_init >> arch_setup_msi_irqs >> native_setup_msi_irqs # .setup_msi_irqs (on x86) >> setup_msi_irq >> write_msi_msg >> __write_msi_msg # write vector table >> >>When a device is reset, its MSI-X vector table is cleared. The type >>of reset (FLR, "backdoor", etc.) doesn't really matter. >> >>After a device reset, the driver would use this path to restore the >>vector table: >> >> pci_restore_state >> pci_restore_msi_state >> __pci_restore_msix_state >> arch_restore_msi_irqs >> default_restore_msi_irqs # .restore_msi_irqs (on x86) >> default_restore_msi_irq >> write_msi_msg >> __write_msi_msg # write vector table >> >>This rewrites the MSI-X vector table (it doesn't use any data that was >>saved by pci_save_state(), so it's not really a "restore" in that >>sense; it writes the vector table from scratch based on the data >>structures maintained by the MSI core). >> >>If the same driver is running in a qemu guest, it still calls >>pci_enable_msix() and pci_restore_state(), but apparently the restore >>path doesn't work. Alex mentioned that qemu virtualizes the vector >>table, so I assume it traps the writel() to the vector table when >>enabling MSI-X? And I assume qemu would also trap the writel() in the >>restore path, but it sounded like it ignores the write because we're >>writing the same data qemu believes to be there? >> >>I'd like to understand more details about how those writel()s >>performed by the guest kernel are handled. Alex mentioned that the >>vector table is inaccessible to the guest, and I see code in >>vfio_pci_bar_rw() that looks like it excludes the table area, so I >>assume that is involved somehow, but I don't know how to connect the >>dots. Obviously the enable path must be handled differently from the >>restore path somehow, because if the enable used vfio_pci_bar_rw(), >>that write would just be dropped, too, and it's not. > > The problem is basically the MSIx entries maintained in QEMU mismatched > with those in hardware (host kernel), which is caused by backdoor "reset": > > - Guest driver enables MSIx entries. MSIx entries are marked as "enabled" > in hardware, QEMU, guest. > - Guest driver calls pci_save_state() and then issues backdoor reset. We > lose everything in MSIx table in hardware. QEMU still maintains "enabled" > MSIx entries. > - Guest driver calls to pci_restore_state() and tries to enable MSIx entries. > Writing to MSIx entries trapped in QEMU. QEMU won't update MSIx entries in > hardware because the MSIx entries are marked as "enabled" in QEMU. It sounds like QEMU assumes the MSIx entries can't be changed by anything other than the writes it traps. This assumption is false (the entries are cleared when the driver resets the device, and QEMU doesn't know about the reset). Why can't QEMU trap the write from pci_restore_state() and update the hardware, even if it thinks nothing has changed? Bjorn -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jun 02, 2014 at 10:57:05AM -0600, Bjorn Helgaas wrote: >On Sat, May 31, 2014 at 5:42 AM, Gavin Shan <gwshan@linux.vnet.ibm.com> wrote: >> On Fri, May 30, 2014 at 04:12:32PM -0600, Bjorn Helgaas wrote: >>>On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: .../... [ Remove the confusing description ] >It sounds like QEMU assumes the MSIx entries can't be changed by >anything other than the writes it traps. This assumption is false >(the entries are cleared when the driver resets the device, and QEMU >doesn't know about the reset). > If I'm correct enough, QEMU disallows access to MSIx table in HW. Access is captured by QEMU and terminated there for most of cases. MSIx message can't be written to HW. >Why can't QEMU trap the write from pci_restore_state() and update the >hardware, even if it thinks nothing has changed? > For MSIx messages, pci_restore_start() restores what the device got from QEMU. I think the MSIx message isn't expected one by HW (more details below). Sorry, Bjorn. I think my last reply should have confused you as that's not correct. The problem and tentative fix has been there for a some time. I almost forgot the details. I rechecked the discussion about the topic. It's not what I described in last reply: http://comments.gmane.org/gmane.comp.emulators.kvm.devel/119689 Let me correct it like this. Alex.W in the cc list is the VFIO expert. I might have something wrong about VFIO and Alex could help correcting :-) 1) Guest: PCI device works fine in guest 2) QEMU: MSIx entry cache (unmasked). It seems the MSIx message maintained by QEMU is figured out by itself and inconsistent with HW (host kernel). It's separate (potential) issue. So QEMU and host don't exchange MSIx message with each other. 3) Guest: PCI device driver calls pci_save_state(), issue reset, pci_restore_state(). 4) QEMU got trapped and notify VFIO PCI device to start the MSIx interrupt, which is done by ioctl() to VFIO PCI device on host side. It seems that VFIO device driver does request_irq() and setup irqfd stuff so that the interrupt can be propagated to QEMU. The problem is that we got MSIx message lost, which was caused by the reset. Unfortunately, no one tried retoring the message to hardware. Eventually, the PCI device sends DMA (for MSIx interrupt) traffic with 0x0's address/data, which isn't allowed on Power platform and causes EEH error. Since MSIx message QEMU and host owes are different and QEMU is having invalid message, so it's not making sense to update hardware with QEMU's cached message. On the other hand, the message data should be restored to HW by somebody and the senario is related to VFIO PCI. It sounds fair to have VFIO PCI driver resotres the message as we did. As you said, it's ugly for driver to write MSIx message. I'm not sure. From guest itself, PCI code is consistent and I don't think there has anything we need improve for this: pci_save_state(), reset, pci_restore_state() should work fine. From the host side, we probably can restore MSIx message in request_irq(). In the IRQ chip callbacks (e.g. startup, unmask), we could have overhead to restore MSIx message. However, it's totally unnecessarily to host itself. Hopefully, I make myself clear this time :-) Thanks, Gavin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: >The MSIx vector table lives in device memory, which may be cleared as >part of a backdoor device reset. This is the case on the IBM IPR HBA >when the BIST is run on the device. When assigned to a QEMU guest, >the guest driver does a pci_save_state(), issues a BIST, then does a >pci_restore_state(). The BIST clears the MSIx vector table, but due >to the way interrupts are configured the pci_restore_state() does not >restore the vector table as expected. Eventually this results in an >EEH error on Power platforms when the device attempts to signal an >interrupt with the zero'd table entry. > >Fix the problem by restoring the host cached MSI message prior to >enabling each vector. > >Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> >Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Alex, please let me know if I need resend this one to you. The patch has been pending for long time, I'm not sure if you still can grab it somewhere. As you might see, Bjorn will take that one with PCI changes. This patch depends on the changes. Thanks, Gavin >--- > drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > >diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >index 9dd49c9..553212f 100644 >--- a/drivers/vfio/pci/vfio_pci_intrs.c >+++ b/drivers/vfio/pci/vfio_pci_intrs.c >@@ -16,6 +16,7 @@ > #include <linux/device.h> > #include <linux/interrupt.h> > #include <linux/eventfd.h> >+#include <linux/msi.h> > #include <linux/pci.h> > #include <linux/file.h> > #include <linux/poll.h> >@@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > return PTR_ERR(trigger); > } > >+ /* >+ * The MSIx vector table resides in device memory which may be cleared >+ * via backdoor resets. We don't allow direct access to the vector >+ * table so even if a userspace driver attempts to save/restore around >+ * such a reset it would be unsuccessful. To avoid this, restore the >+ * cached value of the message prior to enabling. >+ */ >+ if (msix) { >+ struct msi_msg msg; >+ >+ get_cached_msi_msg(irq, &msg); >+ write_msi_msg(irq, &msg); >+ } >+ > ret = request_irq(irq, vfio_msihandler, 0, > vdev->ctx[vector].name, trigger); > if (ret) { >-- >1.8.3.2 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 10, 2014 at 06:13:42PM +1000, Gavin Shan wrote: >On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: >>The MSIx vector table lives in device memory, which may be cleared as >>part of a backdoor device reset. This is the case on the IBM IPR HBA >>when the BIST is run on the device. When assigned to a QEMU guest, >>the guest driver does a pci_save_state(), issues a BIST, then does a >>pci_restore_state(). The BIST clears the MSIx vector table, but due >>to the way interrupts are configured the pci_restore_state() does not >>restore the vector table as expected. Eventually this results in an >>EEH error on Power platforms when the device attempts to signal an >>interrupt with the zero'd table entry. >> >>Fix the problem by restoring the host cached MSI message prior to >>enabling each vector. >> >>Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> >>Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > >Alex, please let me know if I need resend this one to you. The patch >has been pending for long time, I'm not sure if you still can grab >it somewhere. > >As you might see, Bjorn will take that one with PCI changes. This patch >depends on the changes. > Alex, I guess you probably missed last reply. Bjorn acked the first patch and you can pick both of them if I understand correctly. Please let me know if I need resend those 2 patches? Thanks, Gavin >Thanks, >Gavin > >>--- >> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >>diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >>index 9dd49c9..553212f 100644 >>--- a/drivers/vfio/pci/vfio_pci_intrs.c >>+++ b/drivers/vfio/pci/vfio_pci_intrs.c >>@@ -16,6 +16,7 @@ >> #include <linux/device.h> >> #include <linux/interrupt.h> >> #include <linux/eventfd.h> >>+#include <linux/msi.h> >> #include <linux/pci.h> >> #include <linux/file.h> >> #include <linux/poll.h> >>@@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> return PTR_ERR(trigger); >> } >> >>+ /* >>+ * The MSIx vector table resides in device memory which may be cleared >>+ * via backdoor resets. We don't allow direct access to the vector >>+ * table so even if a userspace driver attempts to save/restore around >>+ * such a reset it would be unsuccessful. To avoid this, restore the >>+ * cached value of the message prior to enabling. >>+ */ >>+ if (msix) { >>+ struct msi_msg msg; >>+ >>+ get_cached_msi_msg(irq, &msg); >>+ write_msi_msg(irq, &msg); >>+ } >>+ >> ret = request_irq(irq, vfio_msihandler, 0, >> vdev->ctx[vector].name, trigger); >> if (ret) { >>-- >>1.8.3.2 >> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2014-09-26 at 13:19 +1000, Gavin Shan wrote: > On Wed, Sep 10, 2014 at 06:13:42PM +1000, Gavin Shan wrote: > >On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: > >>The MSIx vector table lives in device memory, which may be cleared as > >>part of a backdoor device reset. This is the case on the IBM IPR HBA > >>when the BIST is run on the device. When assigned to a QEMU guest, > >>the guest driver does a pci_save_state(), issues a BIST, then does a > >>pci_restore_state(). The BIST clears the MSIx vector table, but due > >>to the way interrupts are configured the pci_restore_state() does not > >>restore the vector table as expected. Eventually this results in an > >>EEH error on Power platforms when the device attempts to signal an > >>interrupt with the zero'd table entry. > >> > >>Fix the problem by restoring the host cached MSI message prior to > >>enabling each vector. > >> > >>Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > >>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> > >>Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > > >Alex, please let me know if I need resend this one to you. The patch > >has been pending for long time, I'm not sure if you still can grab > >it somewhere. > > > >As you might see, Bjorn will take that one with PCI changes. This patch > >depends on the changes. > > > > Alex, I guess you probably missed last reply. Bjorn acked the first > patch and you can pick both of them if I understand correctly. Please > let me know if I need resend those 2 patches? Please update the patches, add Bjorn's ACK, test and resend. I'd like to at least know that it still applies and resolves the problem on the current code base since the patch is 4 months old. Thanks, Alex > >>--- > >> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ > >> 1 file changed, 15 insertions(+) > >> > >>diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > >>index 9dd49c9..553212f 100644 > >>--- a/drivers/vfio/pci/vfio_pci_intrs.c > >>+++ b/drivers/vfio/pci/vfio_pci_intrs.c > >>@@ -16,6 +16,7 @@ > >> #include <linux/device.h> > >> #include <linux/interrupt.h> > >> #include <linux/eventfd.h> > >>+#include <linux/msi.h> > >> #include <linux/pci.h> > >> #include <linux/file.h> > >> #include <linux/poll.h> > >>@@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, > >> return PTR_ERR(trigger); > >> } > >> > >>+ /* > >>+ * The MSIx vector table resides in device memory which may be cleared > >>+ * via backdoor resets. We don't allow direct access to the vector > >>+ * table so even if a userspace driver attempts to save/restore around > >>+ * such a reset it would be unsuccessful. To avoid this, restore the > >>+ * cached value of the message prior to enabling. > >>+ */ > >>+ if (msix) { > >>+ struct msi_msg msg; > >>+ > >>+ get_cached_msi_msg(irq, &msg); > >>+ write_msi_msg(irq, &msg); > >>+ } > >>+ > >> ret = request_irq(irq, vfio_msihandler, 0, > >> vdev->ctx[vector].name, trigger); > >> if (ret) { > >>-- > >>1.8.3.2 > >> > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Sep 25, 2014 at 09:46:44PM -0600, Alex Williamson wrote: >On Fri, 2014-09-26 at 13:19 +1000, Gavin Shan wrote: >> On Wed, Sep 10, 2014 at 06:13:42PM +1000, Gavin Shan wrote: >> >On Mon, May 19, 2014 at 01:01:10PM +1000, Gavin Shan wrote: >> >>The MSIx vector table lives in device memory, which may be cleared as >> >>part of a backdoor device reset. This is the case on the IBM IPR HBA >> >>when the BIST is run on the device. When assigned to a QEMU guest, >> >>the guest driver does a pci_save_state(), issues a BIST, then does a >> >>pci_restore_state(). The BIST clears the MSIx vector table, but due >> >>to the way interrupts are configured the pci_restore_state() does not >> >>restore the vector table as expected. Eventually this results in an >> >>EEH error on Power platforms when the device attempts to signal an >> >>interrupt with the zero'd table entry. >> >> >> >>Fix the problem by restoring the host cached MSI message prior to >> >>enabling each vector. >> >> >> >>Reported-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> >> >>Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> >> >>Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >> > >> >Alex, please let me know if I need resend this one to you. The patch >> >has been pending for long time, I'm not sure if you still can grab >> >it somewhere. >> > >> >As you might see, Bjorn will take that one with PCI changes. This patch >> >depends on the changes. >> > >> >> Alex, I guess you probably missed last reply. Bjorn acked the first >> patch and you can pick both of them if I understand correctly. Please >> let me know if I need resend those 2 patches? > >Please update the patches, add Bjorn's ACK, test and resend. I'd like >to at least know that it still applies and resolves the problem on the >current code base since the patch is 4 months old. Thanks, > Retested and it helps avoiding unexpected EEH error as before though the error because of MSIx message lost is eventually progagated to guest and the adapter is recovered successfully by the feature "EEH support for guest". I'll resend it with Bjorn's ack. Thanks, Gavin >Alex > >> >>--- >> >> drivers/vfio/pci/vfio_pci_intrs.c | 15 +++++++++++++++ >> >> 1 file changed, 15 insertions(+) >> >> >> >>diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >> >>index 9dd49c9..553212f 100644 >> >>--- a/drivers/vfio/pci/vfio_pci_intrs.c >> >>+++ b/drivers/vfio/pci/vfio_pci_intrs.c >> >>@@ -16,6 +16,7 @@ >> >> #include <linux/device.h> >> >> #include <linux/interrupt.h> >> >> #include <linux/eventfd.h> >> >>+#include <linux/msi.h> >> >> #include <linux/pci.h> >> >> #include <linux/file.h> >> >> #include <linux/poll.h> >> >>@@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, >> >> return PTR_ERR(trigger); >> >> } >> >> >> >>+ /* >> >>+ * The MSIx vector table resides in device memory which may be cleared >> >>+ * via backdoor resets. We don't allow direct access to the vector >> >>+ * table so even if a userspace driver attempts to save/restore around >> >>+ * such a reset it would be unsuccessful. To avoid this, restore the >> >>+ * cached value of the message prior to enabling. >> >>+ */ >> >>+ if (msix) { >> >>+ struct msi_msg msg; >> >>+ >> >>+ get_cached_msi_msg(irq, &msg); >> >>+ write_msi_msg(irq, &msg); >> >>+ } >> >>+ >> >> ret = request_irq(irq, vfio_msihandler, 0, >> >> vdev->ctx[vector].name, trigger); >> >> if (ret) { >> >>-- >> >>1.8.3.2 >> >> >> > > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 9dd49c9..553212f 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -16,6 +16,7 @@ #include <linux/device.h> #include <linux/interrupt.h> #include <linux/eventfd.h> +#include <linux/msi.h> #include <linux/pci.h> #include <linux/file.h> #include <linux/poll.h> @@ -548,6 +549,20 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev, return PTR_ERR(trigger); } + /* + * The MSIx vector table resides in device memory which may be cleared + * via backdoor resets. We don't allow direct access to the vector + * table so even if a userspace driver attempts to save/restore around + * such a reset it would be unsuccessful. To avoid this, restore the + * cached value of the message prior to enabling. + */ + if (msix) { + struct msi_msg msg; + + get_cached_msi_msg(irq, &msg); + write_msi_msg(irq, &msg); + } + ret = request_irq(irq, vfio_msihandler, 0, vdev->ctx[vector].name, trigger); if (ret) {