Message ID | 1449823994-3356-4-git-send-email-xyjxie@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: > Current vfio-pci implementation disallows to mmap MSI-X table in > case that user get to touch this directly. > > However, EEH mechanism could ensure that a given pci device > can only shoot the MSIs assigned for its PE and guest kernel also > would not write to MSI-X table in pci_enable_msix() because > para-virtualization on PPC64 platform. So MSI-X table is safe to > access directly from the guest with EEH mechanism enabled. The MSI-X table is paravirtualized on vfio in general and interrupt remapping theoretically protects against errant interrupts, so why is this PPC64 specific? We have the same safeguards on x86 if we want to decide they're sufficient. Offhand, the only way I can think that a device can touch the MSI-X table is via backdoors or p2p DMA with another device. > This patch adds support for this case and allow to mmap MSI-X > table if EEH is supported on PPC64 platform. > > And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify > userspace that it's safe to mmap MSI-X table. > > Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> > --- > drivers/vfio/pci/vfio_pci.c | 5 ++++- > drivers/vfio/pci/vfio_pci_private.h | 5 +++++ > include/uapi/linux/vfio.h | 2 ++ > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c > index dbcad99..85d9980 100644 > --- a/drivers/vfio/pci/vfio_pci.c > +++ b/drivers/vfio/pci/vfio_pci.c > @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data, > if (vfio_pci_bar_page_aligned()) > info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED; > > + if (vfio_msix_table_mmap_enabled()) > + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP; > + > info.num_regions = VFIO_PCI_NUM_REGIONS; > info.num_irqs = VFIO_PCI_NUM_IRQS; > > @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) > if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) > return -EINVAL; > > - if (index == vdev->msix_bar) { > + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) { > /* > * Disallow mmaps overlapping the MSI-X table; users don't > * get to touch this directly. We could find somewhere > diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h > index 319352a..835619e 100644 > --- a/drivers/vfio/pci/vfio_pci_private.h > +++ b/drivers/vfio/pci/vfio_pci_private.h > @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void) > return IS_ENABLED(CONFIG_PPC64); > } > > +static inline bool vfio_msix_table_mmap_enabled(void) > +{ > + return IS_ENABLED(CONFIG_EEH); > +} I really dislike these. > + > extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); > extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > index 1fc8066..289e662 100644 > --- a/include/uapi/linux/vfio.h > +++ b/include/uapi/linux/vfio.h > @@ -173,6 +173,8 @@ struct vfio_device_info { > #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ > /* Platform support all PCI MMIO BARs to be page aligned */ > #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4) > +/* Platform support mmapping PCI MSI-X vector table */ > +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP (1 << 5) Again, not sure why this is on the device versus the region, but I'd prefer to investigate whether we can handle this with the sparse mmap capability (or lack of) in the capability chains I proposed[1]. Thanks, Alex [1] https://lkml.org/lkml/2015/11/23/748 > __u32 num_regions; /* Max region index + 1 */ > __u32 num_irqs; /* Max IRQ index + 1 */ > }; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> The MSI-X table is paravirtualized on vfio in general and interrupt > remapping theoretically protects against errant interrupts, so why is > this PPC64 specific? We have the same safeguards on x86 if we want to > decide they're sufficient. Offhand, the only way I can think that a > device can touch the MSI-X table is via backdoors or p2p DMA with > another device. Is this all related to the statements in the PCI(e) spec that the MSI-X table and Pending bit array should in their own BARs? (ISTR it even suggests a BAR each.) Since the MSI-X table exists in device memory/registers there is nothing to stop the device modifying the table contents (or even ignoring the contents and writing address+data pairs that are known to reference the CPUs MSI-X interrupt generation logic). We've an fpga based PCIe slave that has some additional PCIe slaves (associated with the interrupt generation logic) that are currently next to the PBA (which is 8k from the MSI-X table). If we can't map the PBA we can't actually raise any interrupts. The same would be true if page size is 64k and mapping the MSI-X table banned. Do we need to change our PCIe slave address map so we don't need to access anything in the same page (which might be 64k were we to target large ppc - which we don't at the moment) as both the MSI-X table and the PBA? I'd also note that being able to read the MSI-X table is a useful diagnostic that the relevant interrupts are enabled properly. David
On 2015/12/17 4:14, Alex Williamson wrote: > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: >> Current vfio-pci implementation disallows to mmap MSI-X table in >> case that user get to touch this directly. >> >> However, EEH mechanism could ensure that a given pci device >> can only shoot the MSIs assigned for its PE and guest kernel also >> would not write to MSI-X table in pci_enable_msix() because >> para-virtualization on PPC64 platform. So MSI-X table is safe to >> access directly from the guest with EEH mechanism enabled. > The MSI-X table is paravirtualized on vfio in general and interrupt > remapping theoretically protects against errant interrupts, so why is > this PPC64 specific? We have the same safeguards on x86 if we want to > decide they're sufficient. Offhand, the only way I can think that a > device can touch the MSI-X table is via backdoors or p2p DMA with > another device. Maybe I didn't make my point clear. The reasons why we can mmap MSI-X table on PPC64 are? 1. EEH mechanism could ensure that a given pci device can only shoot the MSIs assigned for its PE. So it would not do harm to other memory space when the guest write a garbage MSI-X address/data to the vector table if we passthough MSI-X tables to guest. 2. The guest kernel would not write to MSI-X table on PPC64 platform when device drivers call pci_enable_msix() to initialize MSI-X interrupts. So I think it is safe to mmap/passthrough MSI-X table on PPC64 platform. And I'm not sure whether other architectures can ensure these two points. Thanks. Regards Yongji Xie >> This patch adds support for this case and allow to mmap MSI-X >> table if EEH is supported on PPC64 platform. >> >> And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify >> userspace that it's safe to mmap MSI-X table. >> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >> --- >> drivers/vfio/pci/vfio_pci.c | 5 ++++- >> drivers/vfio/pci/vfio_pci_private.h | 5 +++++ >> include/uapi/linux/vfio.h | 2 ++ >> 3 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c >> index dbcad99..85d9980 100644 >> --- a/drivers/vfio/pci/vfio_pci.c >> +++ b/drivers/vfio/pci/vfio_pci.c >> @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data, >> if (vfio_pci_bar_page_aligned()) >> info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED; >> >> + if (vfio_msix_table_mmap_enabled()) >> + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP; >> + >> info.num_regions = VFIO_PCI_NUM_REGIONS; >> info.num_irqs = VFIO_PCI_NUM_IRQS; >> >> @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) >> if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) >> return -EINVAL; >> >> - if (index == vdev->msix_bar) { >> + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) { >> /* >> * Disallow mmaps overlapping the MSI-X table; users don't >> * get to touch this directly. We could find somewhere >> diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h >> index 319352a..835619e 100644 >> --- a/drivers/vfio/pci/vfio_pci_private.h >> +++ b/drivers/vfio/pci/vfio_pci_private.h >> @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void) >> return IS_ENABLED(CONFIG_PPC64); >> } >> >> +static inline bool vfio_msix_table_mmap_enabled(void) >> +{ >> + return IS_ENABLED(CONFIG_EEH); >> +} > I really dislike these. > >> + >> extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); >> extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); >> >> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >> index 1fc8066..289e662 100644 >> --- a/include/uapi/linux/vfio.h >> +++ b/include/uapi/linux/vfio.h >> @@ -173,6 +173,8 @@ struct vfio_device_info { >> #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ >> /* Platform support all PCI MMIO BARs to be page aligned */ >> #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4) >> +/* Platform support mmapping PCI MSI-X vector table */ >> +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP (1 << 5) > Again, not sure why this is on the device versus the region, but I'd > prefer to investigate whether we can handle this with the sparse mmap > capability (or lack of) in the capability chains I proposed[1]. Thanks, > > Alex > > [1] https://lkml.org/lkml/2015/11/23/748 > Good idea! I wiil investigate it. Thanks. Regards Yongji Xie >> __u32 num_regions; /* Max region index + 1 */ >> __u32 num_irqs; /* Max IRQ index + 1 */ >> }; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-12-17 at 10:08 +0000, David Laight wrote: > > The MSI-X table is paravirtualized on vfio in general and interrupt > > remapping theoretically protects against errant interrupts, so why > > is > > this PPC64 specific? We have the same safeguards on x86 if we want > > to > > decide they're sufficient. Offhand, the only way I can think that a > > device can touch the MSI-X table is via backdoors or p2p DMA with > > another device. > > Is this all related to the statements in the PCI(e) spec that the > MSI-X table and Pending bit array should in their own BARs? > (ISTR it even suggests a BAR each.) > > Since the MSI-X table exists in device memory/registers there is > nothing to stop the device modifying the table contents (or even > ignoring the contents and writing address+data pairs that are known > to reference the CPUs MSI-X interrupt generation logic). > > We've an fpga based PCIe slave that has some additional PCIe slaves > (associated with the interrupt generation logic) that are currently > next to the PBA (which is 8k from the MSI-X table). > If we can't map the PBA we can't actually raise any interrupts. > The same would be true if page size is 64k and mapping the MSI-X > table banned. > > Do we need to change our PCIe slave address map so we don't need > to access anything in the same page (which might be 64k were we to > target large ppc - which we don't at the moment) as both the > MSI-X table and the PBA? > > I'd also note that being able to read the MSI-X table is a useful > diagnostic that the relevant interrupts are enabled properly. Yes, the spec requirement is that MSI-X structures must reside in a 4k aligned area that doesn't overlap with other configuration registers for the device. It's only an advisement to put them into their own BAR, and 4k clearly wasn't as forward looking as we'd hope. Vfio doesn't particularly care about the PBA, but if it resides in the same host PAGE_SIZE area as the MSI-X vector table, you currently won't be able to get to it. Most devices are not at all dependent on the PBA for any sort of functionality. It's really more correct to say that both the vector table and PBA are emulated by QEMU than paravirtualized. Only PPC64 has the guest OS taking a paravirtual path to program the vector table, everyone else attempts to read/write to the device MMIO space, which gets trapped and emulated in QEMU. This is why the QEMU side patch has further ugly hacks to mess with the ordering of MemoryRegions since even if we can access and mmap the MSI-X vector table, we'll still trap into QEMU for emulation. How exactly does the ability to map the PBA affect your ability to raise an interrupt? I can only think that maybe you're writing PBA bits to clear them, but the spec indicates that software should never write to the PBA, only read, and that writes are undefined. So that would be very non-standard, QEMU drops writes, they don't even make it to the hardware. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-12-17 at 18:37 +0800, yongji xie wrote: > > On 2015/12/17 4:14, Alex Williamson wrote: > > On Fri, 2015-12-11 at 16:53 +0800, Yongji Xie wrote: > > > Current vfio-pci implementation disallows to mmap MSI-X table in > > > case that user get to touch this directly. > > > > > > However, EEH mechanism could ensure that a given pci device > > > can only shoot the MSIs assigned for its PE and guest kernel also > > > would not write to MSI-X table in pci_enable_msix() because > > > para-virtualization on PPC64 platform. So MSI-X table is safe to > > > access directly from the guest with EEH mechanism enabled. > > The MSI-X table is paravirtualized on vfio in general and interrupt > > remapping theoretically protects against errant interrupts, so why > > is > > this PPC64 specific? We have the same safeguards on x86 if we want > > to > > decide they're sufficient. Offhand, the only way I can think that > > a > > device can touch the MSI-X table is via backdoors or p2p DMA with > > another device. > Maybe I didn't make my point clear. The reasons why we can mmap MSI-X > table on PPC64 are? > > 1. EEH mechanism could ensure that a given pci device can only shoot > the MSIs assigned for its PE. So it would not do harm to other memory > space when the guest write a garbage MSI-X address/data to the vector > table > if we passthough MSI-X tables to guest. Interrupt remapping does the same on x86. > 2. The guest kernel would not write to MSI-X table on PPC64 platform > when device drivers call pci_enable_msix() to initialize MSI-X > interrupts. This is irrelevant to the vfio API. vfio is a userspace driver interface, QEMU is just one possible consumer of the interface. Even in the case of PPC64 & QEMU, the guest is still capable of writing to the vector table, it just probably won't. > So I think it is safe to mmap/passthrough MSI-X table on PPC64 > platform. > And I'm not sure whether other architectures can ensure these two > points. There is another consideration, which is the API exposed to the user. vfio currently enforces interrupt setup through ioctls by making the PCI mechanisms for interrupt programming inaccessible through the device regions. Ignoring that you are only focused on PPC64 with QEMU, does it make sense for the vfio API to allow a user to manipulate interrupt programming in a way that not only will not work, but in a way that we expect to fail and require error isolation to recover from? I can't say I'm fully convinced that a footnote in the documentation is sufficient for that. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2015-12-17 at 14:41 -0700, Alex Williamson wrote: > > So I think it is safe to mmap/passthrough MSI-X table on PPC64 > > platform. > > And I'm not sure whether other architectures can ensure these two > > points. > > There is another consideration, which is the API exposed to the user. > vfio currently enforces interrupt setup through ioctls by making the > PCI mechanisms for interrupt programming inaccessible through the > device regions. Ignoring that you are only focused on PPC64 with QEMU, > does it make sense for the vfio API to allow a user to manipulate > interrupt programming in a way that not only will not work, but in a > way that we expect to fail and require error isolation to recover from? > I can't say I'm fully convinced that a footnote in the documentation > is sufficient for that. Thanks, Well, one could argue that the "isolation" provided by qemu here is fairly weak anyway ;-) I mean. .. how do you know the device doesn't have a backdoor path into that table via some other MMIO registers anyway ? In any case, the HW isolation on platforms like pseries means that the worst the guest can do si shoot itself in the foot. Big deal. On the other hand, not bothering with intercepting the table has benefits, such as reducing the memory region clutter, but also removing all the massive performacne problems we see because adapters have critical registers in the same 64K page as the MSI-X table. So I don't think there is any question here that we *need* that functionality in power. The filtering of the table by Qemu doesn't provide any practical benefit, it just gets in the way. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RnJvbTogQWxleCBXaWxsaWFtc29uDQo+IFNlbnQ6IDE3IERlY2VtYmVyIDIwMTUgMjE6MDcNCi4u Lg0KPiA+IElzIHRoaXMgYWxsIHJlbGF0ZWQgdG8gdGhlIHN0YXRlbWVudHMgaW4gdGhlIFBDSShl KSBzcGVjIHRoYXQgdGhlDQo+ID4gTVNJLVggdGFibGUgYW5kIFBlbmRpbmcgYml0IGFycmF5IHNo b3VsZCBpbiB0aGVpciBvd24gQkFScz8NCj4gPiAoSVNUUiBpdCBldmVuIHN1Z2dlc3RzIGEgQkFS IGVhY2guKQ0KPiA+DQo+ID4gU2luY2UgdGhlIE1TSS1YIHRhYmxlIGV4aXN0cyBpbiBkZXZpY2Ug bWVtb3J5L3JlZ2lzdGVycyB0aGVyZSBpcw0KPiA+IG5vdGhpbmcgdG8gc3RvcCB0aGUgZGV2aWNl IG1vZGlmeWluZyB0aGUgdGFibGUgY29udGVudHMgKG9yIGV2ZW4NCj4gPiBpZ25vcmluZyB0aGUg Y29udGVudHMgYW5kIHdyaXRpbmcgYWRkcmVzcytkYXRhIHBhaXJzIHRoYXQgYXJlIGtub3duDQo+ ID4gdG8gcmVmZXJlbmNlIHRoZSBDUFVzIE1TSS1YIGludGVycnVwdCBnZW5lcmF0aW9uIGxvZ2lj KS4NCj4gPg0KPiA+IFdlJ3ZlIGFuIGZwZ2EgYmFzZWQgUENJZSBzbGF2ZSB0aGF0IGhhcyBzb21l IGFkZGl0aW9uYWwgUENJZSBzbGF2ZXMNCj4gPiAoYXNzb2NpYXRlZCB3aXRoIHRoZSBpbnRlcnJ1 cHQgZ2VuZXJhdGlvbiBsb2dpYykgdGhhdCBhcmUgY3VycmVudGx5DQo+ID4gbmV4dCB0byB0aGUg UEJBICh3aGljaCBpcyA4ayBmcm9tIHRoZSBNU0ktWCB0YWJsZSkuDQo+ID4gSWYgd2UgY2FuJ3Qg bWFwIHRoZSBQQkEgd2UgY2FuJ3QgYWN0dWFsbHkgcmFpc2UgYW55IGludGVycnVwdHMuDQo+ID4g VGhlIHNhbWUgd291bGQgYmUgdHJ1ZSBpZiBwYWdlIHNpemUgaXMgNjRrIGFuZCBtYXBwaW5nIHRo ZSBNU0ktWA0KPiA+IHRhYmxlIGJhbm5lZC4NCj4gPg0KPiA+IERvIHdlIG5lZWQgdG8gY2hhbmdl IG91ciBQQ0llIHNsYXZlIGFkZHJlc3MgbWFwIHNvIHdlIGRvbid0IG5lZWQNCj4gPiB0byBhY2Nl c3MgYW55dGhpbmcgaW4gdGhlIHNhbWUgcGFnZSAod2hpY2ggbWlnaHQgYmUgNjRrIHdlcmUgd2Ug dG8NCj4gPiB0YXJnZXQgbGFyZ2UgcHBjIC0gd2hpY2ggd2UgZG9uJ3QgYXQgdGhlIG1vbWVudCkg YXMgYm90aCB0aGUNCj4gPiBNU0ktWCB0YWJsZSBhbmQgdGhlIFBCQT8NCj4gPg0KPiA+IEknZCBh bHNvIG5vdGUgdGhhdCBiZWluZyBhYmxlIHRvIHJlYWQgdGhlIE1TSS1YIHRhYmxlIGlzIGEgdXNl ZnVsDQo+ID4gZGlhZ25vc3RpYyB0aGF0IHRoZSByZWxldmFudCBpbnRlcnJ1cHRzIGFyZSBlbmFi bGVkIHByb3Blcmx5Lg0KPiANCj4gWWVzLCB0aGUgc3BlYyByZXF1aXJlbWVudCBpcyB0aGF0IE1T SS1YIHN0cnVjdHVyZXMgbXVzdCByZXNpZGUgaW4gYSA0aw0KPiBhbGlnbmVkIGFyZWEgdGhhdCBk b2Vzbid0IG92ZXJsYXAgd2l0aCBvdGhlciBjb25maWd1cmF0aW9uIHJlZ2lzdGVycw0KPiBmb3Ig dGhlIGRldmljZS4gwqBJdCdzIG9ubHkgYW4gYWR2aXNlbWVudCB0byBwdXQgdGhlbSBpbnRvIHRo ZWlyIG93bg0KPiBCQVIsIGFuZCA0ayBjbGVhcmx5IHdhc24ndCBhcyBmb3J3YXJkIGxvb2tpbmcg YXMgd2UnZCBob3BlLiDCoFZmaW8NCj4gZG9lc24ndCBwYXJ0aWN1bGFybHkgY2FyZSBhYm91dCB0 aGUgUEJBLCBidXQgaWYgaXQgcmVzaWRlcyBpbiB0aGUgc2FtZQ0KPiBob3N0IFBBR0VfU0laRSBh cmVhIGFzIHRoZSBNU0ktWCB2ZWN0b3IgdGFibGUsIHlvdSBjdXJyZW50bHkgd29uJ3QgYmUNCj4g YWJsZSB0byBnZXQgdG8gaXQuIMKgTW9zdCBkZXZpY2VzIGFyZSBub3QgYXQgYWxsIGRlcGVuZGVu dCBvbiB0aGUgUEJBDQo+IGZvciBhbnkgc29ydCBvZiBmdW5jdGlvbmFsaXR5Lg0KDQpIYXZpbmcg c29tZSBoaW50IGluIHRoZSBzcGVjIGFzIHRvIHdoeSB0aGVzZSBtYXBwaW5nIHJ1bGVzIG1pZ2h0 IGJlDQpuZWVkZWQgd291bGQgaGF2ZSBiZWVuIHVzZWZ1bC4NCg0KPiBJdCdzIHJlYWxseSBtb3Jl IGNvcnJlY3QgdG8gc2F5IHRoYXQgYm90aCB0aGUgdmVjdG9yIHRhYmxlIGFuZCBQQkEgYXJlDQo+ IGVtdWxhdGVkIGJ5IFFFTVUgdGhhbiBwYXJhdmlydHVhbGl6ZWQuIMKgT25seSBQUEM2NCBoYXMg dGhlIGd1ZXN0IE9TDQo+IHRha2luZyBhIHBhcmF2aXJ0dWFsIHBhdGggdG8gcHJvZ3JhbSB0aGUg dmVjdG9yIHRhYmxlLCBldmVyeW9uZSBlbHNlDQo+IGF0dGVtcHRzIHRvIHJlYWQvd3JpdGUgdG8g dGhlIGRldmljZSBNTUlPIHNwYWNlLCB3aGljaCBnZXRzIHRyYXBwZWQgYW5kDQo+IGVtdWxhdGVk IGluIFFFTVUuIMKgVGhpcyBpcyB3aHkgdGhlIFFFTVUgc2lkZSBwYXRjaCBoYXMgZnVydGhlciB1 Z2x5DQo+IGhhY2tzIHRvIG1lc3Mgd2l0aCB0aGUgb3JkZXJpbmcgb2YgTWVtb3J5UmVnaW9ucyBz aW5jZSBldmVuIGlmIHdlIGNhbg0KPiBhY2Nlc3MgYW5kIG1tYXAgdGhlIE1TSS1YIHZlY3RvciB0 YWJsZSwgd2UnbGwgc3RpbGwgdHJhcCBpbnRvIFFFTVUgZm9yDQo+IGVtdWxhdGlvbi4NCg0KVGhh bmtzIGZvciB0aGF0IGV4cGxhbmF0aW9uLg0KDQo+IEhvdyBleGFjdGx5IGRvZXMgdGhlIGFiaWxp dHkgdG8gbWFwIHRoZSBQQkEgYWZmZWN0IHlvdXIgYWJpbGl0eSB0bw0KPiByYWlzZSBhbiBpbnRl cnJ1cHQ/DQoNClRoZXJlIGFyZSBvdGhlciByZWdpc3RlcnMgZm9yIHRoZSBsb2dpYyBibG9jayB0 aGF0IGNvbnZlcnRzIGludGVybmFsDQppbnRlcnJ1cHQgcmVxdWVzdHMgaW50byB0aGUgUENJZSB3 cml0ZXMgaW4gdGhlIGxvY2F0aW9ucyBmb2xsb3dpbmcgdGhlIFBCQS4NClRoZXNlIGluY2x1ZGUg aW50ZXJydXB0IGVuYWJsZSBiaXRzLCBhbmQgdGhlIGFiaWxpdHkgdG8gcmVtb3ZlIHBlbmRpbmcN CmludGVycnVwdCByZXF1ZXN0cyAoYW5kIG90aGVyIHN0dWZmIGZvciB0ZXN0aW5nIHRoZSBpbnRl cnJ1cHQgYmxvY2spLg0KWWVzIEkga25vdyBJIHByb2JhYmx5IHNob3VsZG4ndCBoYXZlIGRvbmUg dGhhdCwgYnV0IGl0IGFsbCB3b3JrZWQuDQoNCkF0IGxlYXN0IGl0IGlzIGJldHRlciB0aGFuIHRo ZSBwcmV2aW91cyB2ZXJzaW9uIG9mIHRoZSBoYXJkd2FyZSB0aGF0DQpyZXF1aXJlZCB0aGUgZHJp dmVyIHJlYWQgYmFjayB0aGUgTVNJLVggdGFibGUgZW50cmllcyBpbiBvcmRlciB0bw0Kc2V0IHVw IGFuIG9uLWJvYXJkIFBURSB0byBjb252ZXJ0IGEgMzJiaXQgb24tYm9hcmQgYWRkcmVzcyB0byB0 aGUNCjY0Yml0IFBDSWUgYWRkcmVzcyBuZWVkZWQgZm9yIHRoZSBNU0ktWC4NCg0KSSdsbCAnZml4 JyBvdXIgYm9hcmQgYnkgbWFraW5nIGJvdGggdGhlIE1TSS1YIHRhYmxlIGFuZCBQQkEgYXJlYQ0K YWNjZXNzaWJsZSB0aHJvdWdoIG9uZSBvZiB0aGUgb3RoZXIgQkFScy4gKEFubm95aW5nbHkgdGhp cyB3aWxsDQpiZSBjb25mdXNpbmcgYmVjYXVzZSB0aGUgQkFSIG9mZnNldHMgd2lsbCBoYXZlIHRv IGRpZmZlci4pDQpOb3RlIHRoYXQgdGhpcyBtYWtlcyBhIGNvbXBsZXRlIG1vY2tlcnkgb2YgZGlz YWxsb3dpbmcgdGhlIG1hcHBpbmcNCmluIHRoZSBmaXJzdCBwbGFjZS4NCg0KCURhdmlkDQoNCg0K -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index dbcad99..85d9980 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -446,6 +446,9 @@ static long vfio_pci_ioctl(void *device_data, if (vfio_pci_bar_page_aligned()) info.flags |= VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED; + if (vfio_msix_table_mmap_enabled()) + info.flags |= VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP; + info.num_regions = VFIO_PCI_NUM_REGIONS; info.num_irqs = VFIO_PCI_NUM_IRQS; @@ -871,7 +874,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { + if (index == vdev->msix_bar && !vfio_msix_table_mmap_enabled()) { /* * Disallow mmaps overlapping the MSI-X table; users don't * get to touch this directly. We could find somewhere diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 319352a..835619e 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -74,6 +74,11 @@ static inline bool vfio_pci_bar_page_aligned(void) return IS_ENABLED(CONFIG_PPC64); } +static inline bool vfio_msix_table_mmap_enabled(void) +{ + return IS_ENABLED(CONFIG_EEH); +} + extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev); extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev); diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 1fc8066..289e662 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -173,6 +173,8 @@ struct vfio_device_info { #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */ /* Platform support all PCI MMIO BARs to be page aligned */ #define VFIO_DEVICE_FLAGS_PCI_PAGE_ALIGNED (1 << 4) +/* Platform support mmapping PCI MSI-X vector table */ +#define VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP (1 << 5) __u32 num_regions; /* Max region index + 1 */ __u32 num_irqs; /* Max IRQ index + 1 */ };
Current vfio-pci implementation disallows to mmap MSI-X table in case that user get to touch this directly. However, EEH mechanism could ensure that a given pci device can only shoot the MSIs assigned for its PE and guest kernel also would not write to MSI-X table in pci_enable_msix() because para-virtualization on PPC64 platform. So MSI-X table is safe to access directly from the guest with EEH mechanism enabled. This patch adds support for this case and allow to mmap MSI-X table if EEH is supported on PPC64 platform. And we also add a VFIO_DEVICE_FLAGS_PCI_MSIX_MMAP flag to notify userspace that it's safe to mmap MSI-X table. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> --- drivers/vfio/pci/vfio_pci.c | 5 ++++- drivers/vfio/pci/vfio_pci_private.h | 5 +++++ include/uapi/linux/vfio.h | 2 ++ 3 files changed, 11 insertions(+), 1 deletion(-)