Message ID | 1461761010-5452-6-git-send-email-xyjxie@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> From: Yongji Xie > Sent: Wednesday, April 27, 2016 8:43 PM > > This patch enables mmapping MSI-X tables if hardware supports > interrupt remapping which can ensure that a given pci device > can only shoot the MSIs assigned for it. > > With MSI-X table mmapped, we also need to expose the > read/write interface which will be used to access MSI-X table. > > Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> A curious question here. Does "allow to mmap MSI-X" essentially mean that KVM guest can directly read/write physical MSI-X structure then? Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016/5/3 13:34, Tian, Kevin wrote: >> From: Yongji Xie >> Sent: Wednesday, April 27, 2016 8:43 PM >> >> This patch enables mmapping MSI-X tables if hardware supports >> interrupt remapping which can ensure that a given pci device >> can only shoot the MSIs assigned for it. >> >> With MSI-X table mmapped, we also need to expose the >> read/write interface which will be used to access MSI-X table. >> >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> > A curious question here. Does "allow to mmap MSI-X" essentially > mean that KVM guest can directly read/write physical MSI-X > structure then? > > Thanks > Kevin > Here we just allow to mmap MSI-X table in kernel. It doesn't mean all KVM guest can directly read/write physical MSI-X structure. This should be decided by QEMU. For PPC64 platform, we would allow to passthrough the MSI-X table because we know guest kernel would not write physical MSI-X structure when enabling MSI. Thanks, Yongji -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > Sent: Tuesday, May 03, 2016 2:08 PM > > On 2016/5/3 13:34, Tian, Kevin wrote: > > >> From: Yongji Xie > >> Sent: Wednesday, April 27, 2016 8:43 PM > >> > >> This patch enables mmapping MSI-X tables if hardware supports > >> interrupt remapping which can ensure that a given pci device > >> can only shoot the MSIs assigned for it. > >> > >> With MSI-X table mmapped, we also need to expose the > >> read/write interface which will be used to access MSI-X table. > >> > >> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> > > A curious question here. Does "allow to mmap MSI-X" essentially > > mean that KVM guest can directly read/write physical MSI-X > > structure then? > > > > Thanks > > Kevin > > > > Here we just allow to mmap MSI-X table in kernel. It doesn't > mean all KVM guest can directly read/write physical MSI-X > structure. This should be decided by QEMU. For PPC64 > platform, we would allow to passthrough the MSI-X table > because we know guest kernel would not write physical > MSI-X structure when enabling MSI. > A bit confused here. If guest kernel doesn't need to write physical MSI-X structure, what's the point of passing through the table then? I think the key whether MSI-X table can be passed through is related to where hypervisor control is deployed. At least for x86: - When irq remapping is not enabled, host/hypervisor needs to control physical interrupt message including vector/dest/etc. directly in MSI-X structure, so we cannot allow a guest to access it; - when irq remapping is enabled, host/hypervisor can control interrupt routing in irq remapping table. However MSI-X also needs to be configured as remappable format. In this manner we also cannot allow direct access from guest. The only sane case to pass through MSI-X structure, is a mechanism similar to irq remapping but w/o need to change original MSI-X format so direct access from guest side is safe. Is it the case in PPC64? Thanks Kevin
On 2016/5/3 14:22, Tian, Kevin wrote: >> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] >> Sent: Tuesday, May 03, 2016 2:08 PM >> >> On 2016/5/3 13:34, Tian, Kevin wrote: >> >>>> From: Yongji Xie >>>> Sent: Wednesday, April 27, 2016 8:43 PM >>>> >>>> This patch enables mmapping MSI-X tables if hardware supports >>>> interrupt remapping which can ensure that a given pci device >>>> can only shoot the MSIs assigned for it. >>>> >>>> With MSI-X table mmapped, we also need to expose the >>>> read/write interface which will be used to access MSI-X table. >>>> >>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >>> A curious question here. Does "allow to mmap MSI-X" essentially >>> mean that KVM guest can directly read/write physical MSI-X >>> structure then? >>> >>> Thanks >>> Kevin >>> >> Here we just allow to mmap MSI-X table in kernel. It doesn't >> mean all KVM guest can directly read/write physical MSI-X >> structure. This should be decided by QEMU. For PPC64 >> platform, we would allow to passthrough the MSI-X table >> because we know guest kernel would not write physical >> MSI-X structure when enabling MSI. >> > A bit confused here. If guest kernel doesn't need to write > physical MSI-X structure, what's the point of passing through > the table then? We want to allow the MSI-X table because there may be some critical registers in the same page as the MSI-X table. We have to handle the mmio access to these register in QEMU rather than in guest if mmapping MSI-X table is disallowed. > I think the key whether MSI-X table can be passed through > is related to where hypervisor control is deployed. At least > for x86: > > - When irq remapping is not enabled, host/hypervisor needs > to control physical interrupt message including vector/dest/etc. > directly in MSI-X structure, so we cannot allow a guest to > access it; > > - when irq remapping is enabled, host/hypervisor can control > interrupt routing in irq remapping table. However MSI-X > also needs to be configured as remappable format. In this > manner we also cannot allow direct access from guest. > > The only sane case to pass through MSI-X structure, is a > mechanism similar to irq remapping but w/o need to change > original MSI-X format so direct access from guest side is > safe. Is it the case in PPC64? > > Thanks > Kevin Acutually, we are not aimed at accessing MSI-X table from guest. So I think it's safe to passthrough MSI-X table if we can make sure guest kernel would not touch MSI-X table in normal code path such as para-virtualized guest kernel on PPC64. Thanks, Yongji -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PiBGcm9tOiBZb25namkgWGllDQo+IFNlbnQ6IFR1ZXNkYXksIE1heSAwMywgMjAxNiAzOjM0IFBN DQo+IA0KPiBPbiAyMDE2LzUvMyAxNDoyMiwgVGlhbiwgS2V2aW4gd3JvdGU6DQo+IA0KPiA+PiBG cm9tOiBZb25namkgWGllIFttYWlsdG86eHlqeGllQGxpbnV4LnZuZXQuaWJtLmNvbV0NCj4gPj4g U2VudDogVHVlc2RheSwgTWF5IDAzLCAyMDE2IDI6MDggUE0NCj4gPj4NCj4gPj4gT24gMjAxNi81 LzMgMTM6MzQsIFRpYW4sIEtldmluIHdyb3RlOg0KPiA+Pg0KPiA+Pj4+IEZyb206IFlvbmdqaSBY aWUNCj4gPj4+PiBTZW50OiBXZWRuZXNkYXksIEFwcmlsIDI3LCAyMDE2IDg6NDMgUE0NCj4gPj4+ Pg0KPiA+Pj4+IFRoaXMgcGF0Y2ggZW5hYmxlcyBtbWFwcGluZyBNU0ktWCB0YWJsZXMgaWYgaGFy ZHdhcmUgc3VwcG9ydHMNCj4gPj4+PiBpbnRlcnJ1cHQgcmVtYXBwaW5nIHdoaWNoIGNhbiBlbnN1 cmUgdGhhdCBhIGdpdmVuIHBjaSBkZXZpY2UNCj4gPj4+PiBjYW4gb25seSBzaG9vdCB0aGUgTVNJ cyBhc3NpZ25lZCBmb3IgaXQuDQo+ID4+Pj4NCj4gPj4+PiBXaXRoIE1TSS1YIHRhYmxlIG1tYXBw ZWQsIHdlIGFsc28gbmVlZCB0byBleHBvc2UgdGhlDQo+ID4+Pj4gcmVhZC93cml0ZSBpbnRlcmZh Y2Ugd2hpY2ggd2lsbCBiZSB1c2VkIHRvIGFjY2VzcyBNU0ktWCB0YWJsZS4NCj4gPj4+Pg0KPiA+ Pj4+IFNpZ25lZC1vZmYtYnk6IFlvbmdqaSBYaWUgPHh5anhpZUBsaW51eC52bmV0LmlibS5jb20+ DQo+ID4+PiBBIGN1cmlvdXMgcXVlc3Rpb24gaGVyZS4gRG9lcyAiYWxsb3cgdG8gbW1hcCBNU0kt WCIgZXNzZW50aWFsbHkNCj4gPj4+IG1lYW4gdGhhdCBLVk0gZ3Vlc3QgY2FuIGRpcmVjdGx5IHJl YWQvd3JpdGUgcGh5c2ljYWwgTVNJLVgNCj4gPj4+IHN0cnVjdHVyZSB0aGVuPw0KPiA+Pj4NCj4g Pj4+IFRoYW5rcw0KPiA+Pj4gS2V2aW4NCj4gPj4+DQo+ID4+IEhlcmUgd2UganVzdCBhbGxvdyB0 byBtbWFwIE1TSS1YIHRhYmxlIGluIGtlcm5lbC4gSXQgZG9lc24ndA0KPiA+PiBtZWFuIGFsbCBL Vk0gZ3Vlc3QgY2FuIGRpcmVjdGx5IHJlYWQvd3JpdGUgcGh5c2ljYWwgTVNJLVgNCj4gPj4gc3Ry dWN0dXJlLiBUaGlzIHNob3VsZCBiZSBkZWNpZGVkIGJ5IFFFTVUuIEZvciBQUEM2NA0KPiA+PiBw bGF0Zm9ybSwgd2Ugd291bGQgYWxsb3cgdG8gcGFzc3Rocm91Z2ggdGhlIE1TSS1YIHRhYmxlDQo+ ID4+IGJlY2F1c2Ugd2Uga25vdyBndWVzdCBrZXJuZWwgd291bGQgbm90IHdyaXRlIHBoeXNpY2Fs DQo+ID4+IE1TSS1YIHN0cnVjdHVyZSB3aGVuIGVuYWJsaW5nIE1TSS4NCj4gPj4NCj4gPiBBIGJp dCBjb25mdXNlZCBoZXJlLiBJZiBndWVzdCBrZXJuZWwgZG9lc24ndCBuZWVkIHRvIHdyaXRlDQo+ ID4gcGh5c2ljYWwgTVNJLVggc3RydWN0dXJlLCB3aGF0J3MgdGhlIHBvaW50IG9mIHBhc3Npbmcg dGhyb3VnaA0KPiA+IHRoZSB0YWJsZSB0aGVuPw0KPiANCj4gV2Ugd2FudCB0byBhbGxvdyB0aGUg TVNJLVggdGFibGUgYmVjYXVzZSB0aGVyZSBtYXkgYmUNCj4gc29tZSBjcml0aWNhbCByZWdpc3Rl cnMgaW4gdGhlIHNhbWUgcGFnZSBhcyB0aGUgTVNJLVggdGFibGUuDQo+IFdlIGhhdmUgdG8gaGFu ZGxlIHRoZSBtbWlvIGFjY2VzcyB0byB0aGVzZSByZWdpc3RlciBpbiBRRU1VDQo+IHJhdGhlciB0 aGFuIGluIGd1ZXN0IGlmIG1tYXBwaW5nIE1TSS1YIHRhYmxlIGlzIGRpc2FsbG93ZWQuDQoNClNv IHlvdSBtZWFuIGNyaXRpY2FsIHJlZ2lzdGVycyBpbiBzYW1lIE1NSU8gQkFSIGFzIE1TSS1YDQp0 YWJsZSwgaW5zdGVhZCBvZiB0d28gTU1JTyBCQVJzIGluIHNhbWUgcGFnZSAodGhlIGxhdHRlciBJ DQpzdXBwb3NlIHdpdGggeW91ciB3aG9sZSBwYXRjaHNldCBpdCB3b24ndCBoYXBwZW4gdGhlbik/ DQoNCj4gDQo+ID4gSSB0aGluayB0aGUga2V5IHdoZXRoZXIgTVNJLVggdGFibGUgY2FuIGJlIHBh c3NlZCB0aHJvdWdoDQo+ID4gaXMgcmVsYXRlZCB0byB3aGVyZSBoeXBlcnZpc29yIGNvbnRyb2wg aXMgZGVwbG95ZWQuIEF0IGxlYXN0DQo+ID4gZm9yIHg4NjoNCj4gPg0KPiA+IC0gV2hlbiBpcnEg cmVtYXBwaW5nIGlzIG5vdCBlbmFibGVkLCBob3N0L2h5cGVydmlzb3IgbmVlZHMNCj4gPiB0byBj b250cm9sIHBoeXNpY2FsIGludGVycnVwdCBtZXNzYWdlIGluY2x1ZGluZyB2ZWN0b3IvZGVzdC9l dGMuDQo+ID4gZGlyZWN0bHkgaW4gTVNJLVggc3RydWN0dXJlLCBzbyB3ZSBjYW5ub3QgYWxsb3cg YSBndWVzdCB0bw0KPiA+IGFjY2VzcyBpdDsNCj4gPg0KPiA+IC0gd2hlbiBpcnEgcmVtYXBwaW5n IGlzIGVuYWJsZWQsIGhvc3QvaHlwZXJ2aXNvciBjYW4gY29udHJvbA0KPiA+IGludGVycnVwdCBy b3V0aW5nIGluIGlycSByZW1hcHBpbmcgdGFibGUuIEhvd2V2ZXIgTVNJLVgNCj4gPiBhbHNvIG5l ZWRzIHRvIGJlIGNvbmZpZ3VyZWQgYXMgcmVtYXBwYWJsZSBmb3JtYXQuIEluIHRoaXMNCj4gPiBt YW5uZXIgd2UgYWxzbyBjYW5ub3QgYWxsb3cgZGlyZWN0IGFjY2VzcyBmcm9tIGd1ZXN0Lg0KPiA+ DQo+ID4gVGhlIG9ubHkgc2FuZSBjYXNlIHRvIHBhc3MgdGhyb3VnaCBNU0ktWCBzdHJ1Y3R1cmUs IGlzIGENCj4gPiBtZWNoYW5pc20gc2ltaWxhciB0byBpcnEgcmVtYXBwaW5nIGJ1dCB3L28gbmVl ZCB0byBjaGFuZ2UNCj4gPiBvcmlnaW5hbCBNU0ktWCBmb3JtYXQgc28gZGlyZWN0IGFjY2VzcyBm cm9tIGd1ZXN0IHNpZGUgaXMNCj4gPiBzYWZlLiBJcyBpdCB0aGUgY2FzZSBpbiBQUEM2ND8NCj4g Pg0KPiA+IFRoYW5rcw0KPiA+IEtldmluDQo+IA0KPiBBY3V0dWFsbHksIHdlIGFyZSBub3QgYWlt ZWQgYXQgYWNjZXNzaW5nIE1TSS1YIHRhYmxlIGZyb20NCj4gZ3Vlc3QuIFNvIEkgdGhpbmsgaXQn cyBzYWZlIHRvIHBhc3N0aHJvdWdoIE1TSS1YIHRhYmxlIGlmIHdlDQo+IGNhbiBtYWtlIHN1cmUg Z3Vlc3Qga2VybmVsIHdvdWxkIG5vdCB0b3VjaCBNU0ktWCB0YWJsZSBpbg0KPiBub3JtYWwgY29k ZSBwYXRoIHN1Y2ggYXMgcGFyYS12aXJ0dWFsaXplZCBndWVzdCBrZXJuZWwgb24gUFBDNjQuDQo+ IA0KDQpUaGVuIGhvdyBkbyB5b3UgcHJldmVudCBtYWxpY2lvdXMgZ3Vlc3Qga2VybmVsIGFjY2Vz c2luZyBpdD8NCg0KVGhhbmtzDQpLZXZpbg0K -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RnJvbTogVGlhbiwgS2V2aW4NCj4gU2VudDogMDUgTWF5IDIwMTYgMTA6MzcNCi4uLg0KPiA+IEFj dXR1YWxseSwgd2UgYXJlIG5vdCBhaW1lZCBhdCBhY2Nlc3NpbmcgTVNJLVggdGFibGUgZnJvbQ0K PiA+IGd1ZXN0LiBTbyBJIHRoaW5rIGl0J3Mgc2FmZSB0byBwYXNzdGhyb3VnaCBNU0ktWCB0YWJs ZSBpZiB3ZQ0KPiA+IGNhbiBtYWtlIHN1cmUgZ3Vlc3Qga2VybmVsIHdvdWxkIG5vdCB0b3VjaCBN U0ktWCB0YWJsZSBpbg0KPiA+IG5vcm1hbCBjb2RlIHBhdGggc3VjaCBhcyBwYXJhLXZpcnR1YWxp emVkIGd1ZXN0IGtlcm5lbCBvbiBQUEM2NC4NCj4gPg0KPiANCj4gVGhlbiBob3cgZG8geW91IHBy ZXZlbnQgbWFsaWNpb3VzIGd1ZXN0IGtlcm5lbCBhY2Nlc3NpbmcgaXQ/DQoNCk9yIGEgbWFsaWNp b3VzIGd1ZXN0IGRyaXZlciBmb3IgYW4gZXRoZXJuZXQgY2FyZCBzZXR0aW5nIHVwDQp0aGUgcmVj ZWl2ZSBidWZmZXIgcmluZyB0byBjb250YWluIGEgc2luZ2xlIHdvcmQgZW50cnkgdGhhdA0KY29u dGFpbnMgdGhlIGFkZHJlc3MgYXNzb2NpYXRlZCB3aXRoIGFuIE1TSS1YIGludGVycnVwdCBhbmQN CnRoZW4gdXNpbmcgYSBsb29wYmFjayBtb2RlIHRvIGNhdXNlIGEgc3BlY2lmaWMgcGFja2V0IGJl DQpyZWNlaXZlZCB0aGF0IHdyaXRlcyB0aGUgcmVxdWlyZWQgd29yZCB0aHJvdWdoIHRoYXQgYWRk cmVzcy4NCg0KUmVtZW1iZXIgdGhlIFBDSWUgY3ljbGUgZm9yIGFuIGludGVycnVwdCBpcyBhIG5v cm1hbCBtZW1vcnkgd3JpdGUNCmN5Y2xlLg0KDQoJRGF2aWQNCg0K -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi David and Kevin, On 2016/5/5 17:54, David Laight wrote: > From: Tian, Kevin >> Sent: 05 May 2016 10:37 > ... >>> Acutually, we are not aimed at accessing MSI-X table from >>> guest. So I think it's safe to passthrough MSI-X table if we >>> can make sure guest kernel would not touch MSI-X table in >>> normal code path such as para-virtualized guest kernel on PPC64. >>> >> Then how do you prevent malicious guest kernel accessing it? > Or a malicious guest driver for an ethernet card setting up > the receive buffer ring to contain a single word entry that > contains the address associated with an MSI-X interrupt and > then using a loopback mode to cause a specific packet be > received that writes the required word through that address. > > Remember the PCIe cycle for an interrupt is a normal memory write > cycle. > > David > If we have enough permission to load a malicious driver or kernel, we can easily break the guest without exposed MSI-X table. I think it should be safe to expose MSI-X table if we can make sure that malicious guest driver/kernel can't use the MSI-X table to break other guest or host. The capability of IRQ remapping could provide this kind of protection. Thanks, Yongji -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016/5/5 17:36, Tian, Kevin wrote: >> From: Yongji Xie >> Sent: Tuesday, May 03, 2016 3:34 PM >> >> On 2016/5/3 14:22, Tian, Kevin wrote: >> >>>> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] >>>> Sent: Tuesday, May 03, 2016 2:08 PM >>>> >>>> On 2016/5/3 13:34, Tian, Kevin wrote: >>>> >>>>>> From: Yongji Xie >>>>>> Sent: Wednesday, April 27, 2016 8:43 PM >>>>>> >>>>>> This patch enables mmapping MSI-X tables if hardware supports >>>>>> interrupt remapping which can ensure that a given pci device >>>>>> can only shoot the MSIs assigned for it. >>>>>> >>>>>> With MSI-X table mmapped, we also need to expose the >>>>>> read/write interface which will be used to access MSI-X table. >>>>>> >>>>>> Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> >>>>> A curious question here. Does "allow to mmap MSI-X" essentially >>>>> mean that KVM guest can directly read/write physical MSI-X >>>>> structure then? >>>>> >>>>> Thanks >>>>> Kevin >>>>> >>>> Here we just allow to mmap MSI-X table in kernel. It doesn't >>>> mean all KVM guest can directly read/write physical MSI-X >>>> structure. This should be decided by QEMU. For PPC64 >>>> platform, we would allow to passthrough the MSI-X table >>>> because we know guest kernel would not write physical >>>> MSI-X structure when enabling MSI. >>>> >>> A bit confused here. If guest kernel doesn't need to write >>> physical MSI-X structure, what's the point of passing through >>> the table then? >> We want to allow the MSI-X table because there may be >> some critical registers in the same page as the MSI-X table. >> We have to handle the mmio access to these register in QEMU >> rather than in guest if mmapping MSI-X table is disallowed. > So you mean critical registers in same MMIO BAR as MSI-X > table, instead of two MMIO BARs in same page (the latter I > suppose with your whole patchset it won't happen then)? Yes. That's what I mean! Thanks, Yongji -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > Sent: Thursday, May 05, 2016 7:43 PM > > Hi David and Kevin, > > On 2016/5/5 17:54, David Laight wrote: > > > From: Tian, Kevin > >> Sent: 05 May 2016 10:37 > > ... > >>> Acutually, we are not aimed at accessing MSI-X table from > >>> guest. So I think it's safe to passthrough MSI-X table if we > >>> can make sure guest kernel would not touch MSI-X table in > >>> normal code path such as para-virtualized guest kernel on PPC64. > >>> > >> Then how do you prevent malicious guest kernel accessing it? > > Or a malicious guest driver for an ethernet card setting up > > the receive buffer ring to contain a single word entry that > > contains the address associated with an MSI-X interrupt and > > then using a loopback mode to cause a specific packet be > > received that writes the required word through that address. > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > cycle. > > > > David > > > > If we have enough permission to load a malicious driver or > kernel, we can easily break the guest without exposed > MSI-X table. > > I think it should be safe to expose MSI-X table if we can > make sure that malicious guest driver/kernel can't use > the MSI-X table to break other guest or host. The > capability of IRQ remapping could provide this > kind of protection. > With IRQ remapping it doesn't mean you can pass through MSI-X structure to guest. I know actual IRQ remapping might be platform specific, but at least for Intel VT-d specification, MSI-X entry must be configured with a remappable format by host kernel which contains an index into IRQ remapping table. The index will find a IRQ remapping entry which controls interrupt routing for a specific device. If you allow a malicious program random index into MSI-X entry of assigned device, the hole is obvious... Above might make sense only for a IRQ remapping implementation which doesn't rely on extended MSI-X format (e.g. simply based on BDF). If that's the case for PPC, then you should build MSI-X passthrough based on this fact instead of general IRQ remapping enabled or not. Thanks Kevin
On 2016/5/5 20:15, Tian, Kevin wrote: >> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] >> Sent: Thursday, May 05, 2016 7:43 PM >> >> Hi David and Kevin, >> >> On 2016/5/5 17:54, David Laight wrote: >> >>> From: Tian, Kevin >>>> Sent: 05 May 2016 10:37 >>> ... >>>>> Acutually, we are not aimed at accessing MSI-X table from >>>>> guest. So I think it's safe to passthrough MSI-X table if we >>>>> can make sure guest kernel would not touch MSI-X table in >>>>> normal code path such as para-virtualized guest kernel on PPC64. >>>>> >>>> Then how do you prevent malicious guest kernel accessing it? >>> Or a malicious guest driver for an ethernet card setting up >>> the receive buffer ring to contain a single word entry that >>> contains the address associated with an MSI-X interrupt and >>> then using a loopback mode to cause a specific packet be >>> received that writes the required word through that address. >>> >>> Remember the PCIe cycle for an interrupt is a normal memory write >>> cycle. >>> >>> David >>> >> If we have enough permission to load a malicious driver or >> kernel, we can easily break the guest without exposed >> MSI-X table. >> >> I think it should be safe to expose MSI-X table if we can >> make sure that malicious guest driver/kernel can't use >> the MSI-X table to break other guest or host. The >> capability of IRQ remapping could provide this >> kind of protection. >> > With IRQ remapping it doesn't mean you can pass through MSI-X > structure to guest. I know actual IRQ remapping might be platform > specific, but at least for Intel VT-d specification, MSI-X entry must > be configured with a remappable format by host kernel which > contains an index into IRQ remapping table. The index will find a > IRQ remapping entry which controls interrupt routing for a specific > device. If you allow a malicious program random index into MSI-X > entry of assigned device, the hole is obvious... Do you mean we can trigger MSIs that correspond to interrupt IDs of other devices by writing to MSI-X table although IRQ remapping is enabled? On PPC64, there is a mapping between MSIs and PE num which can be used to identify a PCI device on PHB. So the hardware can ensure a given pci device can only shoot the MSIs assigned for it. Isn't there a similar mapping in IRQ remapping table on Intel. Thanks, Yongji > Above might make sense only for a IRQ remapping implementation > which doesn't rely on extended MSI-X format (e.g. simply based on > BDF). If that's the case for PPC, then you should build MSI-X > passthrough based on this fact instead of general IRQ remapping > enabled or not. > > Thanks > Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 5 May 2016 12:15:46 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > Sent: Thursday, May 05, 2016 7:43 PM > > > > Hi David and Kevin, > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > From: Tian, Kevin > > >> Sent: 05 May 2016 10:37 > > > ... > > >>> Acutually, we are not aimed at accessing MSI-X table from > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > >>> can make sure guest kernel would not touch MSI-X table in > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > >>> > > >> Then how do you prevent malicious guest kernel accessing it? > > > Or a malicious guest driver for an ethernet card setting up > > > the receive buffer ring to contain a single word entry that > > > contains the address associated with an MSI-X interrupt and > > > then using a loopback mode to cause a specific packet be > > > received that writes the required word through that address. > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > cycle. > > > > > > David > > > > > > > If we have enough permission to load a malicious driver or > > kernel, we can easily break the guest without exposed > > MSI-X table. > > > > I think it should be safe to expose MSI-X table if we can > > make sure that malicious guest driver/kernel can't use > > the MSI-X table to break other guest or host. The > > capability of IRQ remapping could provide this > > kind of protection. > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > structure to guest. I know actual IRQ remapping might be platform > specific, but at least for Intel VT-d specification, MSI-X entry must > be configured with a remappable format by host kernel which > contains an index into IRQ remapping table. The index will find a > IRQ remapping entry which controls interrupt routing for a specific > device. If you allow a malicious program random index into MSI-X > entry of assigned device, the hole is obvious... > > Above might make sense only for a IRQ remapping implementation > which doesn't rely on extended MSI-X format (e.g. simply based on > BDF). If that's the case for PPC, then you should build MSI-X > passthrough based on this fact instead of general IRQ remapping > enabled or not. I don't think anyone is expecting that we can expose the MSI-X vector table to the guest and the guest can make direct use of it. The end goal here is that the guest on a power system is already paravirtualized to not program the device MSI-X by directly writing to the MSI-X vector table. They have hypercalls for this since they always run virtualized. Therefore a) they never intend to touch the MSI-X vector table and b) they have sufficient isolation that a guest can only hurt itself by doing so. On x86 we don't have a), our method of programming the MSI-X vector table is to directly write to it. Therefore we will always require QEMU to place a MemoryRegion over the vector table to intercept those accesses. However with interrupt remapping, we do have b) on x86, which means that we don't need to be so strict in disallowing user accesses to the MSI-X vector table. It's not useful for configuring MSI-X on the device, but the user should only be able to hurt themselves by writing it directly. x86 doesn't really get anything out of this change, but it helps this special case on power pretty significantly aiui. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/06/2016 01:05 AM, Alex Williamson wrote: > On Thu, 5 May 2016 12:15:46 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] >>> Sent: Thursday, May 05, 2016 7:43 PM >>> >>> Hi David and Kevin, >>> >>> On 2016/5/5 17:54, David Laight wrote: >>> >>>> From: Tian, Kevin >>>>> Sent: 05 May 2016 10:37 >>>> ... >>>>>> Acutually, we are not aimed at accessing MSI-X table from >>>>>> guest. So I think it's safe to passthrough MSI-X table if we >>>>>> can make sure guest kernel would not touch MSI-X table in >>>>>> normal code path such as para-virtualized guest kernel on PPC64. >>>>>> >>>>> Then how do you prevent malicious guest kernel accessing it? >>>> Or a malicious guest driver for an ethernet card setting up >>>> the receive buffer ring to contain a single word entry that >>>> contains the address associated with an MSI-X interrupt and >>>> then using a loopback mode to cause a specific packet be >>>> received that writes the required word through that address. >>>> >>>> Remember the PCIe cycle for an interrupt is a normal memory write >>>> cycle. >>>> >>>> David >>>> >>> >>> If we have enough permission to load a malicious driver or >>> kernel, we can easily break the guest without exposed >>> MSI-X table. >>> >>> I think it should be safe to expose MSI-X table if we can >>> make sure that malicious guest driver/kernel can't use >>> the MSI-X table to break other guest or host. The >>> capability of IRQ remapping could provide this >>> kind of protection. >>> >> >> With IRQ remapping it doesn't mean you can pass through MSI-X >> structure to guest. I know actual IRQ remapping might be platform >> specific, but at least for Intel VT-d specification, MSI-X entry must >> be configured with a remappable format by host kernel which >> contains an index into IRQ remapping table. The index will find a >> IRQ remapping entry which controls interrupt routing for a specific >> device. If you allow a malicious program random index into MSI-X >> entry of assigned device, the hole is obvious... >> >> Above might make sense only for a IRQ remapping implementation >> which doesn't rely on extended MSI-X format (e.g. simply based on >> BDF). If that's the case for PPC, then you should build MSI-X >> passthrough based on this fact instead of general IRQ remapping >> enabled or not. > > I don't think anyone is expecting that we can expose the MSI-X vector > table to the guest and the guest can make direct use of it. The end > goal here is that the guest on a power system is already > paravirtualized to not program the device MSI-X by directly writing to > the MSI-X vector table. They have hypercalls for this since they > always run virtualized. Therefore a) they never intend to touch the > MSI-X vector table and b) they have sufficient isolation that a guest > can only hurt itself by doing so. > > On x86 we don't have a), our method of programming the MSI-X vector > table is to directly write to it. Therefore we will always require QEMU > to place a MemoryRegion over the vector table to intercept those > accesses. However with interrupt remapping, we do have b) on x86, which > means that we don't need to be so strict in disallowing user accesses > to the MSI-X vector table. It's not useful for configuring MSI-X on > the device, but the user should only be able to hurt themselves by > writing it directly. x86 doesn't really get anything out of this > change, but it helps this special case on power pretty significantly > aiui. Thanks, Excellent short overview, saved :) How do we proceed with these patches? Nobody seems objecting them but also nobody seems taking them either...
On Fri, 6 May 2016 16:35:38 +1000 Alexey Kardashevskiy <aik@ozlabs.ru> wrote: > On 05/06/2016 01:05 AM, Alex Williamson wrote: > > On Thu, 5 May 2016 12:15:46 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > >>> From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > >>> Sent: Thursday, May 05, 2016 7:43 PM > >>> > >>> Hi David and Kevin, > >>> > >>> On 2016/5/5 17:54, David Laight wrote: > >>> > >>>> From: Tian, Kevin > >>>>> Sent: 05 May 2016 10:37 > >>>> ... > >>>>>> Acutually, we are not aimed at accessing MSI-X table from > >>>>>> guest. So I think it's safe to passthrough MSI-X table if we > >>>>>> can make sure guest kernel would not touch MSI-X table in > >>>>>> normal code path such as para-virtualized guest kernel on PPC64. > >>>>>> > >>>>> Then how do you prevent malicious guest kernel accessing it? > >>>> Or a malicious guest driver for an ethernet card setting up > >>>> the receive buffer ring to contain a single word entry that > >>>> contains the address associated with an MSI-X interrupt and > >>>> then using a loopback mode to cause a specific packet be > >>>> received that writes the required word through that address. > >>>> > >>>> Remember the PCIe cycle for an interrupt is a normal memory write > >>>> cycle. > >>>> > >>>> David > >>>> > >>> > >>> If we have enough permission to load a malicious driver or > >>> kernel, we can easily break the guest without exposed > >>> MSI-X table. > >>> > >>> I think it should be safe to expose MSI-X table if we can > >>> make sure that malicious guest driver/kernel can't use > >>> the MSI-X table to break other guest or host. The > >>> capability of IRQ remapping could provide this > >>> kind of protection. > >>> > >> > >> With IRQ remapping it doesn't mean you can pass through MSI-X > >> structure to guest. I know actual IRQ remapping might be platform > >> specific, but at least for Intel VT-d specification, MSI-X entry must > >> be configured with a remappable format by host kernel which > >> contains an index into IRQ remapping table. The index will find a > >> IRQ remapping entry which controls interrupt routing for a specific > >> device. If you allow a malicious program random index into MSI-X > >> entry of assigned device, the hole is obvious... > >> > >> Above might make sense only for a IRQ remapping implementation > >> which doesn't rely on extended MSI-X format (e.g. simply based on > >> BDF). If that's the case for PPC, then you should build MSI-X > >> passthrough based on this fact instead of general IRQ remapping > >> enabled or not. > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > table to the guest and the guest can make direct use of it. The end > > goal here is that the guest on a power system is already > > paravirtualized to not program the device MSI-X by directly writing to > > the MSI-X vector table. They have hypercalls for this since they > > always run virtualized. Therefore a) they never intend to touch the > > MSI-X vector table and b) they have sufficient isolation that a guest > > can only hurt itself by doing so. > > > > On x86 we don't have a), our method of programming the MSI-X vector > > table is to directly write to it. Therefore we will always require QEMU > > to place a MemoryRegion over the vector table to intercept those > > accesses. However with interrupt remapping, we do have b) on x86, which > > means that we don't need to be so strict in disallowing user accesses > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > the device, but the user should only be able to hurt themselves by > > writing it directly. x86 doesn't really get anything out of this > > change, but it helps this special case on power pretty significantly > > aiui. Thanks, > > Excellent short overview, saved :) > > How do we proceed with these patches? Nobody seems objecting them but also > nobody seems taking them either... Well, this series is still based on some non-upstream patches, so... Once that dependency is resolved this series should probably be split into functional areas for acceptance by the appropriate subsystem maintainers. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Thursday, May 05, 2016 11:05 PM > > On Thu, 5 May 2016 12:15:46 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > Hi David and Kevin, > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > From: Tian, Kevin > > > >> Sent: 05 May 2016 10:37 > > > > ... > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > >>> can make sure guest kernel would not touch MSI-X table in > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > >>> > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > Or a malicious guest driver for an ethernet card setting up > > > > the receive buffer ring to contain a single word entry that > > > > contains the address associated with an MSI-X interrupt and > > > > then using a loopback mode to cause a specific packet be > > > > received that writes the required word through that address. > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > cycle. > > > > > > > > David > > > > > > > > > > If we have enough permission to load a malicious driver or > > > kernel, we can easily break the guest without exposed > > > MSI-X table. > > > > > > I think it should be safe to expose MSI-X table if we can > > > make sure that malicious guest driver/kernel can't use > > > the MSI-X table to break other guest or host. The > > > capability of IRQ remapping could provide this > > > kind of protection. > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > structure to guest. I know actual IRQ remapping might be platform > > specific, but at least for Intel VT-d specification, MSI-X entry must > > be configured with a remappable format by host kernel which > > contains an index into IRQ remapping table. The index will find a > > IRQ remapping entry which controls interrupt routing for a specific > > device. If you allow a malicious program random index into MSI-X > > entry of assigned device, the hole is obvious... > > > > Above might make sense only for a IRQ remapping implementation > > which doesn't rely on extended MSI-X format (e.g. simply based on > > BDF). If that's the case for PPC, then you should build MSI-X > > passthrough based on this fact instead of general IRQ remapping > > enabled or not. > > I don't think anyone is expecting that we can expose the MSI-X vector > table to the guest and the guest can make direct use of it. The end > goal here is that the guest on a power system is already > paravirtualized to not program the device MSI-X by directly writing to > the MSI-X vector table. They have hypercalls for this since they > always run virtualized. Therefore a) they never intend to touch the > MSI-X vector table and b) they have sufficient isolation that a guest > can only hurt itself by doing so. > > On x86 we don't have a), our method of programming the MSI-X vector > table is to directly write to it. Therefore we will always require QEMU > to place a MemoryRegion over the vector table to intercept those > accesses. However with interrupt remapping, we do have b) on x86, which > means that we don't need to be so strict in disallowing user accesses > to the MSI-X vector table. It's not useful for configuring MSI-X on > the device, but the user should only be able to hurt themselves by > writing it directly. x86 doesn't really get anything out of this > change, but it helps this special case on power pretty significantly > aiui. Thanks, > Allowing guest direct write to MSI-x table has system-wide impact. As I explained earlier, hypervisor needs to control "interrupt_index" programmed in MSI-X entry, which is used to associate a specific IRQ remapping entry. Now if you expose whole MSI-x table to guest, it can program random index into MSI-X entry to associate with any IRQ remapping entry and then there won't be any isolation per se. You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d spec. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 11 May 2016 06:29:06 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Thursday, May 05, 2016 11:05 PM > > > > On Thu, 5 May 2016 12:15:46 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > Hi David and Kevin, > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > From: Tian, Kevin > > > > >> Sent: 05 May 2016 10:37 > > > > > ... > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > >>> > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > the receive buffer ring to contain a single word entry that > > > > > contains the address associated with an MSI-X interrupt and > > > > > then using a loopback mode to cause a specific packet be > > > > > received that writes the required word through that address. > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > cycle. > > > > > > > > > > David > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > kernel, we can easily break the guest without exposed > > > > MSI-X table. > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > make sure that malicious guest driver/kernel can't use > > > > the MSI-X table to break other guest or host. The > > > > capability of IRQ remapping could provide this > > > > kind of protection. > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > structure to guest. I know actual IRQ remapping might be platform > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > be configured with a remappable format by host kernel which > > > contains an index into IRQ remapping table. The index will find a > > > IRQ remapping entry which controls interrupt routing for a specific > > > device. If you allow a malicious program random index into MSI-X > > > entry of assigned device, the hole is obvious... > > > > > > Above might make sense only for a IRQ remapping implementation > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > BDF). If that's the case for PPC, then you should build MSI-X > > > passthrough based on this fact instead of general IRQ remapping > > > enabled or not. > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > table to the guest and the guest can make direct use of it. The end > > goal here is that the guest on a power system is already > > paravirtualized to not program the device MSI-X by directly writing to > > the MSI-X vector table. They have hypercalls for this since they > > always run virtualized. Therefore a) they never intend to touch the > > MSI-X vector table and b) they have sufficient isolation that a guest > > can only hurt itself by doing so. > > > > On x86 we don't have a), our method of programming the MSI-X vector > > table is to directly write to it. Therefore we will always require QEMU > > to place a MemoryRegion over the vector table to intercept those > > accesses. However with interrupt remapping, we do have b) on x86, which > > means that we don't need to be so strict in disallowing user accesses > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > the device, but the user should only be able to hurt themselves by > > writing it directly. x86 doesn't really get anything out of this > > change, but it helps this special case on power pretty significantly > > aiui. Thanks, > > > > Allowing guest direct write to MSI-x table has system-wide impact. > As I explained earlier, hypervisor needs to control "interrupt_index" > programmed in MSI-X entry, which is used to associate a specific > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > it can program random index into MSI-X entry to associate with > any IRQ remapping entry and then there won't be any isolation per se. > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > spec. I think you're extrapolating beyond the vfio interface. The change here is to remove the vfio protection of the MSI-X vector table when the system provides interrupt isolation. The argument is that this is safe to do because the hardware protects the host from erroneous and malicious user programming, but it is not meant to provide a means to program MSI-X directly through the vector table. This is effectively the same as general DMA programming, if the vfio programming model is not followed the device generates iommu faults. I do have a concern that userspace driver writers are going to more often presume they can use the vector table directly because of this change, but I don't know that that is sufficient reason to prevent such a change. They'll quickly discover the device generates faults on interrupt rather than working as expected. The question of how this affects the hypervisor is completely separate. Vfio in the kernel is a userspace driver interface, not a hypervisor. QEMU is the hypervisor. We have no plans to provide the VM with direct access to the MSI-X vector table for x86 guests on QEMU. There will still be a MemoryRegion emulating access to the vector table in order to translate writes into vfio interrupt ioctls. POWER would drop the MemoryRegion so that the full page is mapped to the guest, with the expectation that the guest never makes use of it since MSI-X is always configured via hypercalls on POWER systems. Likewise I expect ARM will still make use of the MemoryRegion emulating the vector table, which leaves them exposed to the performance issue POWER is trying to solve here since ARM also has 64k page support and has no paravirtualized MSI-X programming interface afaik. x86 is not impervious to this issue either, but a 4k page size falls within the PCI spec recommendations for MSI-X structure alignment, so it's much more rare to have issues. We have certainly seen hardware vendors that ignore the PCI spec alignment recommendations, but so far only for placing device registers within the same page as the PBA, which is an easier problem to deal with since the PBA is relatively unused by drivers. This may be an area where we need to develop a paravirt interface for MSI-X programming which disable the MemoryRegion emulating the vector table when used. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Wednesday, May 11, 2016 11:54 PM > > On Wed, 11 May 2016 06:29:06 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > On Thu, 5 May 2016 12:15:46 +0000 > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > From: Tian, Kevin > > > > > >> Sent: 05 May 2016 10:37 > > > > > > ... > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > > >>> > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > the receive buffer ring to contain a single word entry that > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > then using a loopback mode to cause a specific packet be > > > > > > received that writes the required word through that address. > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > > cycle. > > > > > > > > > > > > David > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > kernel, we can easily break the guest without exposed > > > > > MSI-X table. > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > make sure that malicious guest driver/kernel can't use > > > > > the MSI-X table to break other guest or host. The > > > > > capability of IRQ remapping could provide this > > > > > kind of protection. > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > structure to guest. I know actual IRQ remapping might be platform > > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > > be configured with a remappable format by host kernel which > > > > contains an index into IRQ remapping table. The index will find a > > > > IRQ remapping entry which controls interrupt routing for a specific > > > > device. If you allow a malicious program random index into MSI-X > > > > entry of assigned device, the hole is obvious... > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > passthrough based on this fact instead of general IRQ remapping > > > > enabled or not. > > > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > > table to the guest and the guest can make direct use of it. The end > > > goal here is that the guest on a power system is already > > > paravirtualized to not program the device MSI-X by directly writing to > > > the MSI-X vector table. They have hypercalls for this since they > > > always run virtualized. Therefore a) they never intend to touch the > > > MSI-X vector table and b) they have sufficient isolation that a guest > > > can only hurt itself by doing so. > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > table is to directly write to it. Therefore we will always require QEMU > > > to place a MemoryRegion over the vector table to intercept those > > > accesses. However with interrupt remapping, we do have b) on x86, which > > > means that we don't need to be so strict in disallowing user accesses > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > the device, but the user should only be able to hurt themselves by > > > writing it directly. x86 doesn't really get anything out of this > > > change, but it helps this special case on power pretty significantly > > > aiui. Thanks, > > > > > > > Allowing guest direct write to MSI-x table has system-wide impact. > > As I explained earlier, hypervisor needs to control "interrupt_index" > > programmed in MSI-X entry, which is used to associate a specific > > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > > it can program random index into MSI-X entry to associate with > > any IRQ remapping entry and then there won't be any isolation per se. > > > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > > spec. > > I think you're extrapolating beyond the vfio interface. The change > here is to remove the vfio protection of the MSI-X vector table when > the system provides interrupt isolation. The argument is that this is > safe to do because the hardware protects the host from erroneous and > malicious user programming, but it is not meant to provide a means to > program MSI-X directly through the vector table. This is effectively Sorry I didn't get this point. Once we allow userspace to mmap MSI-X table, isn't it equivalent to allowing userspace directly program vector table? Or is there other mechanism to prevent direct programming? > the same as general DMA programming, if the vfio programming model is > not followed the device generates iommu faults. I do have a concern > that userspace driver writers are going to more often presume they can > use the vector table directly because of this change, but I don't know > that that is sufficient reason to prevent such a change. They'll > quickly discover the device generates faults on interrupt rather than > working as expected. If userspace can actually program vector table directly, there is not always fault triggered. As long as MSI-X table is fully under control of userspace, any interrupt index can be used here which may link to a IRQ remapping entry allocated for other devices. > > The question of how this affects the hypervisor is completely > separate. Vfio in the kernel is a userspace driver interface, not a > hypervisor. QEMU is the hypervisor. We have no plans to provide the VM > with direct access to the MSI-X vector table for x86 guests on QEMU. > There will still be a MemoryRegion emulating access to the vector table > in order to translate writes into vfio interrupt ioctls. POWER would > drop the MemoryRegion so that the full page is mapped to the guest, > with the expectation that the guest never makes use of it since MSI-X > is always configured via hypercalls on POWER systems. Likewise I > expect ARM will still make use of the MemoryRegion emulating the vector > table, which leaves them exposed to the performance issue POWER is > trying to solve here since ARM also has 64k page support and has no > paravirtualized MSI-X programming interface afaik. x86 is not > impervious to this issue either, but a 4k page size falls within the > PCI spec recommendations for MSI-X structure alignment, so it's much > more rare to have issues. We have certainly seen hardware vendors that > ignore the PCI spec alignment recommendations, but so far only for > placing device registers within the same page as the PBA, which is an > easier problem to deal with since the PBA is relatively unused by > drivers. This may be an area where we need to develop a paravirt > interface for MSI-X programming which disable the MemoryRegion > emulating the vector table when used. Thanks, > I get this point. I incorrectly line allowing userspace mmap MSI-X table to allowing guest direct access to MSI-X. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 12 May 2016 01:19:44 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > On Wed, 11 May 2016 06:29:06 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > On Thu, 5 May 2016 12:15:46 +0000 > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > From: Tian, Kevin > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > ... > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > > > >>> > > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > > > cycle. > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > kernel, we can easily break the guest without exposed > > > > > > MSI-X table. > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > the MSI-X table to break other guest or host. The > > > > > > capability of IRQ remapping could provide this > > > > > > kind of protection. > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > structure to guest. I know actual IRQ remapping might be platform > > > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > > > be configured with a remappable format by host kernel which > > > > > contains an index into IRQ remapping table. The index will find a > > > > > IRQ remapping entry which controls interrupt routing for a specific > > > > > device. If you allow a malicious program random index into MSI-X > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > enabled or not. > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > > > table to the guest and the guest can make direct use of it. The end > > > > goal here is that the guest on a power system is already > > > > paravirtualized to not program the device MSI-X by directly writing to > > > > the MSI-X vector table. They have hypercalls for this since they > > > > always run virtualized. Therefore a) they never intend to touch the > > > > MSI-X vector table and b) they have sufficient isolation that a guest > > > > can only hurt itself by doing so. > > > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > > table is to directly write to it. Therefore we will always require QEMU > > > > to place a MemoryRegion over the vector table to intercept those > > > > accesses. However with interrupt remapping, we do have b) on x86, which > > > > means that we don't need to be so strict in disallowing user accesses > > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > > the device, but the user should only be able to hurt themselves by > > > > writing it directly. x86 doesn't really get anything out of this > > > > change, but it helps this special case on power pretty significantly > > > > aiui. Thanks, > > > > > > > > > > Allowing guest direct write to MSI-x table has system-wide impact. > > > As I explained earlier, hypervisor needs to control "interrupt_index" > > > programmed in MSI-X entry, which is used to associate a specific > > > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > > > it can program random index into MSI-X entry to associate with > > > any IRQ remapping entry and then there won't be any isolation per se. > > > > > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > > > spec. > > > > I think you're extrapolating beyond the vfio interface. The change > > here is to remove the vfio protection of the MSI-X vector table when > > the system provides interrupt isolation. The argument is that this is > > safe to do because the hardware protects the host from erroneous and > > malicious user programming, but it is not meant to provide a means to > > program MSI-X directly through the vector table. This is effectively > > Sorry I didn't get this point. Once we allow userspace to mmap MSI-X > table, isn't it equivalent to allowing userspace directly program vector > table? Or is there other mechanism to prevent direct programming? This would remove the mechanism that prevents direct programming. Users can configure the device to perform DMA w/o setting up an IOMMU mapping, which generates a fault. Users can incorrectly manipulate the MSI-X vector table instead of using the proper vfio programming interface, which generates a fault. > > the same as general DMA programming, if the vfio programming model is > > not followed the device generates iommu faults. I do have a concern > > that userspace driver writers are going to more often presume they can > > use the vector table directly because of this change, but I don't know > > that that is sufficient reason to prevent such a change. They'll > > quickly discover the device generates faults on interrupt rather than > > working as expected. > > If userspace can actually program vector table directly, there is not > always fault triggered. As long as MSI-X table is fully under control > of userspace, any interrupt index can be used here which may link > to a IRQ remapping entry allocated for other devices. Is not part of the vector lookup comparing the source ID of the DMA write? If not then VT-d interrupt remapping offers us no protection from a malicious guest performing DMA to the same address. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Thursday, May 12, 2016 10:21 AM > > On Thu, 12 May 2016 01:19:44 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > On Wed, 11 May 2016 06:29:06 +0000 > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > On Thu, 5 May 2016 12:15:46 +0000 > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > ... > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > > > > >>> > > > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > > > > cycle. > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > MSI-X table. > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > capability of IRQ remapping could provide this > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > structure to guest. I know actual IRQ remapping might be platform > > > > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > > > > be configured with a remappable format by host kernel which > > > > > > contains an index into IRQ remapping table. The index will find a > > > > > > IRQ remapping entry which controls interrupt routing for a specific > > > > > > device. If you allow a malicious program random index into MSI-X > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > > enabled or not. > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > > > > table to the guest and the guest can make direct use of it. The end > > > > > goal here is that the guest on a power system is already > > > > > paravirtualized to not program the device MSI-X by directly writing to > > > > > the MSI-X vector table. They have hypercalls for this since they > > > > > always run virtualized. Therefore a) they never intend to touch the > > > > > MSI-X vector table and b) they have sufficient isolation that a guest > > > > > can only hurt itself by doing so. > > > > > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > > > table is to directly write to it. Therefore we will always require QEMU > > > > > to place a MemoryRegion over the vector table to intercept those > > > > > accesses. However with interrupt remapping, we do have b) on x86, which > > > > > means that we don't need to be so strict in disallowing user accesses > > > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > > > the device, but the user should only be able to hurt themselves by > > > > > writing it directly. x86 doesn't really get anything out of this > > > > > change, but it helps this special case on power pretty significantly > > > > > aiui. Thanks, > > > > > > > > > > > > > Allowing guest direct write to MSI-x table has system-wide impact. > > > > As I explained earlier, hypervisor needs to control "interrupt_index" > > > > programmed in MSI-X entry, which is used to associate a specific > > > > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > > > > it can program random index into MSI-X entry to associate with > > > > any IRQ remapping entry and then there won't be any isolation per se. > > > > > > > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > > > > spec. > > > > > > I think you're extrapolating beyond the vfio interface. The change > > > here is to remove the vfio protection of the MSI-X vector table when > > > the system provides interrupt isolation. The argument is that this is > > > safe to do because the hardware protects the host from erroneous and > > > malicious user programming, but it is not meant to provide a means to > > > program MSI-X directly through the vector table. This is effectively > > > > Sorry I didn't get this point. Once we allow userspace to mmap MSI-X > > table, isn't it equivalent to allowing userspace directly program vector > > table? Or is there other mechanism to prevent direct programming? > > This would remove the mechanism that prevents direct programming. > Users can configure the device to perform DMA w/o setting up an IOMMU > mapping, which generates a fault. Users can incorrectly manipulate the > MSI-X vector table instead of using the proper vfio programming > interface, which generates a fault. > > > > the same as general DMA programming, if the vfio programming model is > > > not followed the device generates iommu faults. I do have a concern > > > that userspace driver writers are going to more often presume they can > > > use the vector table directly because of this change, but I don't know > > > that that is sufficient reason to prevent such a change. They'll > > > quickly discover the device generates faults on interrupt rather than > > > working as expected. > > > > If userspace can actually program vector table directly, there is not > > always fault triggered. As long as MSI-X table is fully under control > > of userspace, any interrupt index can be used here which may link > > to a IRQ remapping entry allocated for other devices. > > Is not part of the vector lookup comparing the source ID of the DMA > write? If not then VT-d interrupt remapping offers us no protection > from a malicious guest performing DMA to the same address. Thanks, > For x86 IRQ remapping doesn't rely on existing DMAR remapping table for DMA. Instead IRQ remapping table is a global table per VT-d engine, shared by all devices behind this VT-d engine. Each IRQ remapping table entry (IRTE) can specify: - whether to validate source-id of the interrupt requests; - whether to verify the whole source id; - whether to verify some bits of the source id; ... (section 9.10 Interrupt Remapping Table Entry (IRTE) in VT-d spec) IRTE index is programmed into corresponding MSI-X entry of a device. When an interrupt message is triggered from that device, IRQ remapping hardware will use the index to find the IRTE entry. As long as host kernel has strict control of MSI-X table, it can choose to not validate source-id as long as IRTE index is trusted. That is why I concerned you cannot do this simply based on whether IRQ remapping is available on x86. If such usage is really required, you'll need some more accurate indicator per iommu structure to tell whether safe to allow mmap MSI-X to user space. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 12 May 2016 04:53:19 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Thursday, May 12, 2016 10:21 AM > > > > On Thu, 12 May 2016 01:19:44 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > > > On Wed, 11 May 2016 06:29:06 +0000 > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > > > On Thu, 5 May 2016 12:15:46 +0000 > > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > > ... > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > > > > > >>> > > > > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > > > > > cycle. > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > > MSI-X table. > > > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > > capability of IRQ remapping could provide this > > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > > structure to guest. I know actual IRQ remapping might be platform > > > > > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > > > > > be configured with a remappable format by host kernel which > > > > > > > contains an index into IRQ remapping table. The index will find a > > > > > > > IRQ remapping entry which controls interrupt routing for a specific > > > > > > > device. If you allow a malicious program random index into MSI-X > > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > > > enabled or not. > > > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > > > > > table to the guest and the guest can make direct use of it. The end > > > > > > goal here is that the guest on a power system is already > > > > > > paravirtualized to not program the device MSI-X by directly writing to > > > > > > the MSI-X vector table. They have hypercalls for this since they > > > > > > always run virtualized. Therefore a) they never intend to touch the > > > > > > MSI-X vector table and b) they have sufficient isolation that a guest > > > > > > can only hurt itself by doing so. > > > > > > > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > > > > table is to directly write to it. Therefore we will always require QEMU > > > > > > to place a MemoryRegion over the vector table to intercept those > > > > > > accesses. However with interrupt remapping, we do have b) on x86, which > > > > > > means that we don't need to be so strict in disallowing user accesses > > > > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > > > > the device, but the user should only be able to hurt themselves by > > > > > > writing it directly. x86 doesn't really get anything out of this > > > > > > change, but it helps this special case on power pretty significantly > > > > > > aiui. Thanks, > > > > > > > > > > > > > > > > Allowing guest direct write to MSI-x table has system-wide impact. > > > > > As I explained earlier, hypervisor needs to control "interrupt_index" > > > > > programmed in MSI-X entry, which is used to associate a specific > > > > > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > > > > > it can program random index into MSI-X entry to associate with > > > > > any IRQ remapping entry and then there won't be any isolation per se. > > > > > > > > > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > > > > > spec. > > > > > > > > I think you're extrapolating beyond the vfio interface. The change > > > > here is to remove the vfio protection of the MSI-X vector table when > > > > the system provides interrupt isolation. The argument is that this is > > > > safe to do because the hardware protects the host from erroneous and > > > > malicious user programming, but it is not meant to provide a means to > > > > program MSI-X directly through the vector table. This is effectively > > > > > > Sorry I didn't get this point. Once we allow userspace to mmap MSI-X > > > table, isn't it equivalent to allowing userspace directly program vector > > > table? Or is there other mechanism to prevent direct programming? > > > > This would remove the mechanism that prevents direct programming. > > Users can configure the device to perform DMA w/o setting up an IOMMU > > mapping, which generates a fault. Users can incorrectly manipulate the > > MSI-X vector table instead of using the proper vfio programming > > interface, which generates a fault. > > > > > > the same as general DMA programming, if the vfio programming model is > > > > not followed the device generates iommu faults. I do have a concern > > > > that userspace driver writers are going to more often presume they can > > > > use the vector table directly because of this change, but I don't know > > > > that that is sufficient reason to prevent such a change. They'll > > > > quickly discover the device generates faults on interrupt rather than > > > > working as expected. > > > > > > If userspace can actually program vector table directly, there is not > > > always fault triggered. As long as MSI-X table is fully under control > > > of userspace, any interrupt index can be used here which may link > > > to a IRQ remapping entry allocated for other devices. > > > > Is not part of the vector lookup comparing the source ID of the DMA > > write? If not then VT-d interrupt remapping offers us no protection > > from a malicious guest performing DMA to the same address. Thanks, > > > > For x86 IRQ remapping doesn't rely on existing DMAR remapping table for > DMA. Instead IRQ remapping table is a global table per VT-d engine, shared > by all devices behind this VT-d engine. Each IRQ remapping table entry > (IRTE) can specify: > > - whether to validate source-id of the interrupt requests; > - whether to verify the whole source id; > - whether to verify some bits of the source id; > ... > (section 9.10 Interrupt Remapping Table Entry (IRTE) in VT-d spec) > > IRTE index is programmed into corresponding MSI-X entry of a device. > When an interrupt message is triggered from that device, IRQ remapping > hardware will use the index to find the IRTE entry. > > As long as host kernel has strict control of MSI-X table, it can choose to > not validate source-id as long as IRTE index is trusted. > > That is why I concerned you cannot do this simply based on whether > IRQ remapping is available on x86. If such usage is really required, > you'll need some more accurate indicator per iommu structure to tell > whether safe to allow mmap MSI-X to user space. As argued previously in this thread, there's nothing special about a DMA write to memory versus a DMA write to a special address that triggers an MSI vector. If the device is DMA capable, which we assume all are, it can be fooled into generating those DMA writes regardless of whether we actively block access to the MSI-X vector table itself. With respect to the IRTE, these entries are always under host control, and aside from the potential opt-in gap when using the intremap=nosid boot option, they always make use of source-id validation as Linux is written today. Compatibility mode support is also disabled. Nothing here changes that. In fact, I suspect the only advantage to blocking MSI-X vector table access w/o interrupt remapping is to avoid obvious collisions if it were to be programmed directly, it doesn't actually prevent an identical DMA transaction from being generated by other means. The MSI-X vector table of a device is always considered untrusted which is why we require user opt-ins to subvert that protection. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Friday, May 13, 2016 1:48 AM > > On Thu, 12 May 2016 04:53:19 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > Sent: Thursday, May 12, 2016 10:21 AM > > > > > > On Thu, 12 May 2016 01:19:44 +0000 > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > > > > > On Wed, 11 May 2016 06:29:06 +0000 > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > > > > > On Thu, 5 May 2016 12:15:46 +0000 > > > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > > > ... > > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > > > > > > >>> > > > > > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > > > > > > cycle. > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > > > MSI-X table. > > > > > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > > > capability of IRQ remapping could provide this > > > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > > > structure to guest. I know actual IRQ remapping might be platform > > > > > > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > > > > > > be configured with a remappable format by host kernel which > > > > > > > > contains an index into IRQ remapping table. The index will find a > > > > > > > > IRQ remapping entry which controls interrupt routing for a specific > > > > > > > > device. If you allow a malicious program random index into MSI-X > > > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > > > > enabled or not. > > > > > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > > > > > > table to the guest and the guest can make direct use of it. The end > > > > > > > goal here is that the guest on a power system is already > > > > > > > paravirtualized to not program the device MSI-X by directly writing to > > > > > > > the MSI-X vector table. They have hypercalls for this since they > > > > > > > always run virtualized. Therefore a) they never intend to touch the > > > > > > > MSI-X vector table and b) they have sufficient isolation that a guest > > > > > > > can only hurt itself by doing so. > > > > > > > > > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > > > > > table is to directly write to it. Therefore we will always require QEMU > > > > > > > to place a MemoryRegion over the vector table to intercept those > > > > > > > accesses. However with interrupt remapping, we do have b) on x86, which > > > > > > > means that we don't need to be so strict in disallowing user accesses > > > > > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > > > > > the device, but the user should only be able to hurt themselves by > > > > > > > writing it directly. x86 doesn't really get anything out of this > > > > > > > change, but it helps this special case on power pretty significantly > > > > > > > aiui. Thanks, > > > > > > > > > > > > > > > > > > > Allowing guest direct write to MSI-x table has system-wide impact. > > > > > > As I explained earlier, hypervisor needs to control "interrupt_index" > > > > > > programmed in MSI-X entry, which is used to associate a specific > > > > > > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > > > > > > it can program random index into MSI-X entry to associate with > > > > > > any IRQ remapping entry and then there won't be any isolation per se. > > > > > > > > > > > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > > > > > > spec. > > > > > > > > > > I think you're extrapolating beyond the vfio interface. The change > > > > > here is to remove the vfio protection of the MSI-X vector table when > > > > > the system provides interrupt isolation. The argument is that this is > > > > > safe to do because the hardware protects the host from erroneous and > > > > > malicious user programming, but it is not meant to provide a means to > > > > > program MSI-X directly through the vector table. This is effectively > > > > > > > > Sorry I didn't get this point. Once we allow userspace to mmap MSI-X > > > > table, isn't it equivalent to allowing userspace directly program vector > > > > table? Or is there other mechanism to prevent direct programming? > > > > > > This would remove the mechanism that prevents direct programming. > > > Users can configure the device to perform DMA w/o setting up an IOMMU > > > mapping, which generates a fault. Users can incorrectly manipulate the > > > MSI-X vector table instead of using the proper vfio programming > > > interface, which generates a fault. > > > > > > > > the same as general DMA programming, if the vfio programming model is > > > > > not followed the device generates iommu faults. I do have a concern > > > > > that userspace driver writers are going to more often presume they can > > > > > use the vector table directly because of this change, but I don't know > > > > > that that is sufficient reason to prevent such a change. They'll > > > > > quickly discover the device generates faults on interrupt rather than > > > > > working as expected. > > > > > > > > If userspace can actually program vector table directly, there is not > > > > always fault triggered. As long as MSI-X table is fully under control > > > > of userspace, any interrupt index can be used here which may link > > > > to a IRQ remapping entry allocated for other devices. > > > > > > Is not part of the vector lookup comparing the source ID of the DMA > > > write? If not then VT-d interrupt remapping offers us no protection > > > from a malicious guest performing DMA to the same address. Thanks, > > > > > > > For x86 IRQ remapping doesn't rely on existing DMAR remapping table for > > DMA. Instead IRQ remapping table is a global table per VT-d engine, shared > > by all devices behind this VT-d engine. Each IRQ remapping table entry > > (IRTE) can specify: > > > > - whether to validate source-id of the interrupt requests; > > - whether to verify the whole source id; > > - whether to verify some bits of the source id; > > ... > > (section 9.10 Interrupt Remapping Table Entry (IRTE) in VT-d spec) > > > > IRTE index is programmed into corresponding MSI-X entry of a device. > > When an interrupt message is triggered from that device, IRQ remapping > > hardware will use the index to find the IRTE entry. > > > > As long as host kernel has strict control of MSI-X table, it can choose to > > not validate source-id as long as IRTE index is trusted. > > > > That is why I concerned you cannot do this simply based on whether > > IRQ remapping is available on x86. If such usage is really required, > > you'll need some more accurate indicator per iommu structure to tell > > whether safe to allow mmap MSI-X to user space. > > As argued previously in this thread, there's nothing special about a > DMA write to memory versus a DMA write to a special address that > triggers an MSI vector. If the device is DMA capable, which we assume > all are, it can be fooled into generating those DMA writes regardless > of whether we actively block access to the MSI-X vector table itself. But with DMA remapping above can be blocked. > > With respect to the IRTE, these entries are always under host control, > and aside from the potential opt-in gap when using the intremap=nosid > boot option, they always make use of source-id validation as Linux is > written today. Compatibility mode support is also disabled. Nothing > here changes that. In fact, I suspect the only advantage to blocking VFIO as a standalone module shouldn't make assumption on how IRQ remapping is actually implemented in kernel (especially when VT-d spec allows no source ID verification). > MSI-X vector table access w/o interrupt remapping is to avoid obvious > collisions if it were to be programmed directly, it doesn't actually > prevent an identical DMA transaction from being generated by other Kernel can enable DMA remapping but disable IRQ remapping. In such case identical DMA transaction can be prevented. > means. The MSI-X vector table of a device is always considered > untrusted which is why we require user opt-ins to subvert that > protection. Thanks, > I only partially agree with this statement since there is different trust level for kernel space driver and user space driver. OS may choose to trust kernel space driver then it can enable IRQ remapping only for load balance purpose w/o source id verfification. In such case MSI-X vector table is trusted and fully under kernel control. Allowing to mmap MSI-X table in user space definitely breaks that boundary. Anyway my point is simple. Let's ignore how Linux kernel implements IRQ remapping on x86 (which may change time to time), and just focus on architectural possibility. Non-x86 platform may implement IRQ remapping completely separate from device side, then checking availability of IRQ remapping is enough to decide whether mmap MSI-X table is safe. x86 with VT-d can be configured to a mode requiring host control of both MSI-X entry and IRQ remapping hardware (without source id check). In such case it's insufficient to make decision simply based on IRQ remapping availability. We need a way to query from IRQ remapping module whether it's actually safe to mmap MSI-X. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Tian, Kevin > Sent: Friday, May 13, 2016 10:33 AM > > > means. The MSI-X vector table of a device is always considered > > untrusted which is why we require user opt-ins to subvert that > > protection. Thanks, > > > > I only partially agree with this statement since there is different > trust level for kernel space driver and user space driver. > > OS may choose to trust kernel space driver then it can enable IRQ > remapping only for load balance purpose w/o source id verfification. > In such case MSI-X vector table is trusted and fully under kernel > control. Allowing to mmap MSI-X table in user space definitely > breaks that boundary. > > Anyway my point is simple. Let's ignore how Linux kernel implements > IRQ remapping on x86 (which may change time to time), and just > focus on architectural possibility. Non-x86 platform may implement > IRQ remapping completely separate from device side, then checking > availability of IRQ remapping is enough to decide whether mmap > MSI-X table is safe. x86 with VT-d can be configured to a mode > requiring host control of both MSI-X entry and IRQ remapping hardware > (without source id check). In such case it's insufficient to make > decision simply based on IRQ remapping availability. We need a way > to query from IRQ remapping module whether it's actually safe to > mmap MSI-X. > Or another way is to have VFIO explicitly request intel-iommu to enforce sid check when IRQ remapping is enabled... Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 13 May 2016 02:33:18 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Friday, May 13, 2016 1:48 AM > > > > On Thu, 12 May 2016 04:53:19 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > Sent: Thursday, May 12, 2016 10:21 AM > > > > > > > > On Thu, 12 May 2016 01:19:44 +0000 > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > > Sent: Wednesday, May 11, 2016 11:54 PM > > > > > > > > > > > > On Wed, 11 May 2016 06:29:06 +0000 > > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > > > > > Sent: Thursday, May 05, 2016 11:05 PM > > > > > > > > > > > > > > > > On Thu, 5 May 2016 12:15:46 +0000 > > > > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > > > > > > > From: Yongji Xie [mailto:xyjxie@linux.vnet.ibm.com] > > > > > > > > > > Sent: Thursday, May 05, 2016 7:43 PM > > > > > > > > > > > > > > > > > > > > Hi David and Kevin, > > > > > > > > > > > > > > > > > > > > On 2016/5/5 17:54, David Laight wrote: > > > > > > > > > > > > > > > > > > > > > From: Tian, Kevin > > > > > > > > > > >> Sent: 05 May 2016 10:37 > > > > > > > > > > > ... > > > > > > > > > > >>> Acutually, we are not aimed at accessing MSI-X table from > > > > > > > > > > >>> guest. So I think it's safe to passthrough MSI-X table if we > > > > > > > > > > >>> can make sure guest kernel would not touch MSI-X table in > > > > > > > > > > >>> normal code path such as para-virtualized guest kernel on PPC64. > > > > > > > > > > >>> > > > > > > > > > > >> Then how do you prevent malicious guest kernel accessing it? > > > > > > > > > > > Or a malicious guest driver for an ethernet card setting up > > > > > > > > > > > the receive buffer ring to contain a single word entry that > > > > > > > > > > > contains the address associated with an MSI-X interrupt and > > > > > > > > > > > then using a loopback mode to cause a specific packet be > > > > > > > > > > > received that writes the required word through that address. > > > > > > > > > > > > > > > > > > > > > > Remember the PCIe cycle for an interrupt is a normal memory write > > > > > > > > > > > cycle. > > > > > > > > > > > > > > > > > > > > > > David > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > If we have enough permission to load a malicious driver or > > > > > > > > > > kernel, we can easily break the guest without exposed > > > > > > > > > > MSI-X table. > > > > > > > > > > > > > > > > > > > > I think it should be safe to expose MSI-X table if we can > > > > > > > > > > make sure that malicious guest driver/kernel can't use > > > > > > > > > > the MSI-X table to break other guest or host. The > > > > > > > > > > capability of IRQ remapping could provide this > > > > > > > > > > kind of protection. > > > > > > > > > > > > > > > > > > > > > > > > > > > > With IRQ remapping it doesn't mean you can pass through MSI-X > > > > > > > > > structure to guest. I know actual IRQ remapping might be platform > > > > > > > > > specific, but at least for Intel VT-d specification, MSI-X entry must > > > > > > > > > be configured with a remappable format by host kernel which > > > > > > > > > contains an index into IRQ remapping table. The index will find a > > > > > > > > > IRQ remapping entry which controls interrupt routing for a specific > > > > > > > > > device. If you allow a malicious program random index into MSI-X > > > > > > > > > entry of assigned device, the hole is obvious... > > > > > > > > > > > > > > > > > > Above might make sense only for a IRQ remapping implementation > > > > > > > > > which doesn't rely on extended MSI-X format (e.g. simply based on > > > > > > > > > BDF). If that's the case for PPC, then you should build MSI-X > > > > > > > > > passthrough based on this fact instead of general IRQ remapping > > > > > > > > > enabled or not. > > > > > > > > > > > > > > > > I don't think anyone is expecting that we can expose the MSI-X vector > > > > > > > > table to the guest and the guest can make direct use of it. The end > > > > > > > > goal here is that the guest on a power system is already > > > > > > > > paravirtualized to not program the device MSI-X by directly writing to > > > > > > > > the MSI-X vector table. They have hypercalls for this since they > > > > > > > > always run virtualized. Therefore a) they never intend to touch the > > > > > > > > MSI-X vector table and b) they have sufficient isolation that a guest > > > > > > > > can only hurt itself by doing so. > > > > > > > > > > > > > > > > On x86 we don't have a), our method of programming the MSI-X vector > > > > > > > > table is to directly write to it. Therefore we will always require QEMU > > > > > > > > to place a MemoryRegion over the vector table to intercept those > > > > > > > > accesses. However with interrupt remapping, we do have b) on x86, which > > > > > > > > means that we don't need to be so strict in disallowing user accesses > > > > > > > > to the MSI-X vector table. It's not useful for configuring MSI-X on > > > > > > > > the device, but the user should only be able to hurt themselves by > > > > > > > > writing it directly. x86 doesn't really get anything out of this > > > > > > > > change, but it helps this special case on power pretty significantly > > > > > > > > aiui. Thanks, > > > > > > > > > > > > > > > > > > > > > > Allowing guest direct write to MSI-x table has system-wide impact. > > > > > > > As I explained earlier, hypervisor needs to control "interrupt_index" > > > > > > > programmed in MSI-X entry, which is used to associate a specific > > > > > > > IRQ remapping entry. Now if you expose whole MSI-x table to guest, > > > > > > > it can program random index into MSI-X entry to associate with > > > > > > > any IRQ remapping entry and then there won't be any isolation per se. > > > > > > > > > > > > > > You can check "5.5.2 MSI and MSI-X Register Programming" in VT-d > > > > > > > spec. > > > > > > > > > > > > I think you're extrapolating beyond the vfio interface. The change > > > > > > here is to remove the vfio protection of the MSI-X vector table when > > > > > > the system provides interrupt isolation. The argument is that this is > > > > > > safe to do because the hardware protects the host from erroneous and > > > > > > malicious user programming, but it is not meant to provide a means to > > > > > > program MSI-X directly through the vector table. This is effectively > > > > > > > > > > Sorry I didn't get this point. Once we allow userspace to mmap MSI-X > > > > > table, isn't it equivalent to allowing userspace directly program vector > > > > > table? Or is there other mechanism to prevent direct programming? > > > > > > > > This would remove the mechanism that prevents direct programming. > > > > Users can configure the device to perform DMA w/o setting up an IOMMU > > > > mapping, which generates a fault. Users can incorrectly manipulate the > > > > MSI-X vector table instead of using the proper vfio programming > > > > interface, which generates a fault. > > > > > > > > > > the same as general DMA programming, if the vfio programming model is > > > > > > not followed the device generates iommu faults. I do have a concern > > > > > > that userspace driver writers are going to more often presume they can > > > > > > use the vector table directly because of this change, but I don't know > > > > > > that that is sufficient reason to prevent such a change. They'll > > > > > > quickly discover the device generates faults on interrupt rather than > > > > > > working as expected. > > > > > > > > > > If userspace can actually program vector table directly, there is not > > > > > always fault triggered. As long as MSI-X table is fully under control > > > > > of userspace, any interrupt index can be used here which may link > > > > > to a IRQ remapping entry allocated for other devices. > > > > > > > > Is not part of the vector lookup comparing the source ID of the DMA > > > > write? If not then VT-d interrupt remapping offers us no protection > > > > from a malicious guest performing DMA to the same address. Thanks, > > > > > > > > > > For x86 IRQ remapping doesn't rely on existing DMAR remapping table for > > > DMA. Instead IRQ remapping table is a global table per VT-d engine, shared > > > by all devices behind this VT-d engine. Each IRQ remapping table entry > > > (IRTE) can specify: > > > > > > - whether to validate source-id of the interrupt requests; > > > - whether to verify the whole source id; > > > - whether to verify some bits of the source id; > > > ... > > > (section 9.10 Interrupt Remapping Table Entry (IRTE) in VT-d spec) > > > > > > IRTE index is programmed into corresponding MSI-X entry of a device. > > > When an interrupt message is triggered from that device, IRQ remapping > > > hardware will use the index to find the IRTE entry. > > > > > > As long as host kernel has strict control of MSI-X table, it can choose to > > > not validate source-id as long as IRTE index is trusted. > > > > > > That is why I concerned you cannot do this simply based on whether > > > IRQ remapping is available on x86. If such usage is really required, > > > you'll need some more accurate indicator per iommu structure to tell > > > whether safe to allow mmap MSI-X to user space. > > > > As argued previously in this thread, there's nothing special about a > > DMA write to memory versus a DMA write to a special address that > > triggers an MSI vector. If the device is DMA capable, which we assume > > all are, it can be fooled into generating those DMA writes regardless > > of whether we actively block access to the MSI-X vector table itself. > > But with DMA remapping above can be blocked. How? VT-d explicitly ignores DMA writes to 0xFEEx_xxxx, section 3.13: Write requests without PASID of DWORD length are treated as interrupt requests. Interrupt requests are not subjected to DMA remapping[...] Instead, remapping hardware can be enabled to subject such interrupt requests to interrupt remapping. > > With respect to the IRTE, these entries are always under host control, > > and aside from the potential opt-in gap when using the intremap=nosid > > boot option, they always make use of source-id validation as Linux is > > written today. Compatibility mode support is also disabled. Nothing > > here changes that. In fact, I suspect the only advantage to blocking > > VFIO as a standalone module shouldn't make assumption on how > IRQ remapping is actually implemented in kernel (especially when > VT-d spec allows no source ID verification). vfio is NOT a stand alone module, it is part of the kernel that happens to be configurable as a module. We have the ability to change with the kernel and influence how the kernel behaves. > > MSI-X vector table access w/o interrupt remapping is to avoid obvious > > collisions if it were to be programmed directly, it doesn't actually > > prevent an identical DMA transaction from being generated by other > > Kernel can enable DMA remapping but disable IRQ remapping. In such > case identical DMA transaction can be prevented. Not according to the VT-d spec as quoted above. If so, how? > > means. The MSI-X vector table of a device is always considered > > untrusted which is why we require user opt-ins to subvert that > > protection. Thanks, > > > > I only partially agree with this statement since there is different > trust level for kernel space driver and user space driver. > > OS may choose to trust kernel space driver then it can enable IRQ > remapping only for load balance purpose w/o source id verfification. > In such case MSI-X vector table is trusted and fully under kernel > control. Allowing to mmap MSI-X table in user space definitely > breaks that boundary. The OS may choose to do this, but we're part of the OS and it doesn't do this and has no reason to do this. > Anyway my point is simple. Let's ignore how Linux kernel implements > IRQ remapping on x86 (which may change time to time), and just > focus on architectural possibility. Non-x86 platform may implement > IRQ remapping completely separate from device side, then checking > availability of IRQ remapping is enough to decide whether mmap > MSI-X table is safe. x86 with VT-d can be configured to a mode > requiring host control of both MSI-X entry and IRQ remapping hardware > (without source id check). In such case it's insufficient to make > decision simply based on IRQ remapping availability. We need a way > to query from IRQ remapping module whether it's actually safe to > mmap MSI-X. We're going in circles here. This patch is attempting to remove protection from the MSI-X vector table that is really nothing more than security theater already. That "protection" only actually prevents casual misuse of the API which is really only a problem when the platform offers no form of interrupt isolation, such as VT-d with DMA remapping but not interrupt remapping. Disabling source-id checking in VT-d should be handled as the equivalent of disabling interrupt remapping altogether as far as the IOMMU API is concerned. That's a trivial gap that should be fixed. There is no such thing as a secure MSI-X vector table when untrusted userspace drivers are involved, we must always assume that a device can generate DMA writes that are indistinguishable from actual interrupt requests and if the platform does not provide interrupt isolation we should require the user to opt-in to an unsafe mode. Simply denying direct writes to the vector table or preventing mapping of the vector table into the user address space does not provide any tangible form of protection. Many devices make use of window registers that allow backdoors to arbitrary device registers. Some drivers even use this as the primary means for configuring MSI-X, which makes them incompatible with device assignment without device specific quirks to enable virtualization of these paths. If you have an objection to this patch, please show me how preventing direct CPU access to the MSI-X vector table provides any kind of security guarantee of the contents of the vector table and also prove to me that a device cannot spoof a DMA write that is indistinguishable from one associated with an actual interrupt, regardless of the contents of the MSI-X vector table. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: Friday, May 13, 2016 1:33 PM > > > > > > As argued previously in this thread, there's nothing special about a > > > DMA write to memory versus a DMA write to a special address that > > > triggers an MSI vector. If the device is DMA capable, which we assume > > > all are, it can be fooled into generating those DMA writes regardless > > > of whether we actively block access to the MSI-X vector table itself. > > > > But with DMA remapping above can be blocked. > > How? VT-d explicitly ignores DMA writes to 0xFEEx_xxxx, section 3.13: > > Write requests without PASID of DWORD length are treated as interrupt > requests. Interrupt requests are not subjected to DMA remapping[...] > Instead, remapping hardware can be enabled to subject such interrupt > requests to interrupt remapping. Thanks for catching this! > > > > MSI-X vector table access w/o interrupt remapping is to avoid obvious > > > collisions if it were to be programmed directly, it doesn't actually > > > prevent an identical DMA transaction from being generated by other > > > > Kernel can enable DMA remapping but disable IRQ remapping. In such > > case identical DMA transaction can be prevented. > > Not according to the VT-d spec as quoted above. If so, how? So my argument on this is wrong. sorry. > > > Anyway my point is simple. Let's ignore how Linux kernel implements > > IRQ remapping on x86 (which may change time to time), and just > > focus on architectural possibility. Non-x86 platform may implement > > IRQ remapping completely separate from device side, then checking > > availability of IRQ remapping is enough to decide whether mmap > > MSI-X table is safe. x86 with VT-d can be configured to a mode > > requiring host control of both MSI-X entry and IRQ remapping hardware > > (without source id check). In such case it's insufficient to make > > decision simply based on IRQ remapping availability. We need a way > > to query from IRQ remapping module whether it's actually safe to > > mmap MSI-X. > > We're going in circles here. This patch is attempting to remove > protection from the MSI-X vector table that is really nothing more than > security theater already. That "protection" only actually prevents > casual misuse of the API which is really only a problem when the > platform offers no form of interrupt isolation, such as VT-d with DMA > remapping but not interrupt remapping. Disabling source-id checking in > VT-d should be handled as the equivalent of disabling interrupt > remapping altogether as far as the IOMMU API is concerned. That's > a trivial gap that should be fixed. There is no such thing as a secure That is the main change I'm asking against original patch, which has: +static void pci_check_msi_remapping(struct pci_dev *pdev, + const struct iommu_ops *ops) +{ + struct pci_bus *bus = pdev->bus; + + if (ops->capable(IOMMU_CAP_INTR_REMAP) && + !(bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) + bus->bus_flags |= PCI_BUS_FLAGS_MSI_REMAP; +} + Above flag should be cleared when source-id checking is disabled on x86. Yes, VFIO is part of OS but any assumption we made about other parts needs to be reflected accurately in the code. > MSI-X vector table when untrusted userspace drivers are involved, we > must always assume that a device can generate DMA writes that are > indistinguishable from actual interrupt requests and if the platform > does not provide interrupt isolation we should require the user to > opt-in to an unsafe mode. > > Simply denying direct writes to the vector table or preventing mapping > of the vector table into the user address space does not provide any > tangible form of protection. Many devices make use of window registers > that allow backdoors to arbitrary device registers. Some drivers even > use this as the primary means for configuring MSI-X, which makes them > incompatible with device assignment without device specific quirks to > enable virtualization of these paths. > > If you have an objection to this patch, please show me how preventing > direct CPU access to the MSI-X vector table provides any kind of > security guarantee of the contents of the vector table and also prove > to me that a device cannot spoof a DMA write that is indistinguishable > from one associated with an actual interrupt, regardless of the > contents of the MSI-X vector table. Thanks, > I'm not object to the whole patch series. As replied above, my point is just that current condition of allowing mmap MSI-X in this patch is not accurate, but my argument on security manner is not correct. Thanks for your elaboration to make it clear. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: 13 May 2016 06:33 ... > Simply denying direct writes to the vector table or preventing mapping > of the vector table into the user address space does not provide any > tangible form of protection. Many devices make use of window registers > that allow backdoors to arbitrary device registers. Some drivers even > use this as the primary means for configuring MSI-X, which makes them > incompatible with device assignment without device specific quirks to > enable virtualization of these paths. We have one fgpa based PCIe slave where the device driver has to read the MSI-X table and then write the value to other fpga registers so that the logic can generate the correct PCIe write cycle when an interrupt is requested. The MSI-X table itself is only as a PCIe slave. We also have host accessible DMA controllers that the device driver uses to copy data to kernel memory. These could easily be used to generate arbitrary MSI-X requests. As I've said earlier it is almost certainly possible to get any ethernet hardware to perform something similar. So without hardware that is able to limit the memory and MSI-X that each PCIe endpoint can access I believe that if a virtualisation system gives a guest kernel direct access to a PCIe devices it gives the guest kernel the ability to raise and MSI-X interrupt and read/write any physical memory. (I've not looked at the cpu virtualisation support, but do know what the PCIe devices can do.) More interestingly, probably the 'worst' thing (from a security point of view) that changing the MSI-X table lets you do is a write to an arbitrary physical memory address. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 98059df..7eba77d 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -591,7 +591,9 @@ static long vfio_pci_ioctl(void *device_data, pci_resource_flags(pdev, info.index) & IORESOURCE_MEM && info.size >= PAGE_SIZE) { info.flags |= VFIO_REGION_INFO_FLAG_MMAP; - if (info.index == vdev->msix_bar) { + if (info.index == vdev->msix_bar && + !(pdev->bus->bus_flags & + PCI_BUS_FLAGS_MSI_REMAP)) { ret = msix_sparse_mmap_cap(vdev, &caps); if (ret) return ret; @@ -1023,7 +1025,8 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) if (phys_len < PAGE_SIZE || req_start + req_len > phys_len) return -EINVAL; - if (index == vdev->msix_bar) { + if (index == vdev->msix_bar && + !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) { /* * Disallow mmaps overlapping the MSI-X table; users don't * get to touch this directly. We could find somewhere diff --git a/drivers/vfio/pci/vfio_pci_rdwr.c b/drivers/vfio/pci/vfio_pci_rdwr.c index 5ffd1d9..dbf9cd0 100644 --- a/drivers/vfio/pci/vfio_pci_rdwr.c +++ b/drivers/vfio/pci/vfio_pci_rdwr.c @@ -164,7 +164,8 @@ ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf, } else io = vdev->barmap[bar]; - if (bar == vdev->msix_bar) { + if (bar == vdev->msix_bar && + !(pdev->bus->bus_flags & PCI_BUS_FLAGS_MSI_REMAP)) { x_start = vdev->msix_offset; x_end = vdev->msix_offset + vdev->msix_size; }
This patch enables mmapping MSI-X tables if hardware supports interrupt remapping which can ensure that a given pci device can only shoot the MSIs assigned for it. With MSI-X table mmapped, we also need to expose the read/write interface which will be used to access MSI-X table. Signed-off-by: Yongji Xie <xyjxie@linux.vnet.ibm.com> --- drivers/vfio/pci/vfio_pci.c | 7 +++++-- drivers/vfio/pci/vfio_pci_rdwr.c | 3 ++- 2 files changed, 7 insertions(+), 3 deletions(-)