Message ID | 819398f808613a1109bc06440268b8746e7540d4.1568475323.git-series.marmarek@invisiblethingslab.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Fix PCI passthrough for HVM with stubdomain | expand |
On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: > Allow device model running in stubdomain to enable/disable INTx/MSI(-X), > bypassing pciback. While pciback is still used to access config space > from within stubdomain, it refuse to write to > PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE > in non-permissive mode. Which is the right thing to do for PV domain > (the main use case for pciback), as PV domain should use XEN_PCI_OP_* > commands for that. Unfortunately those commands are not good for > stubdomain use, as they configure MSI in dom0's kernel too, which should > not happen for HVM domain. Why the "for HVM domain" here? I.e. why would this be correct for a PV domain? Besides my dislike for such a bypass (imo all of the handling should go through pciback, or none of it) I continue to wonder whether the problem can't be addressed by a pciback change. And even if not, I'd still wonder whether the request shouldn't go through pciback, to retain proper layering. Ultimately it may be better to have even the map/unmap go through pciback (it's at least an apparent violation of the original physdev-op model that these two are XSM_DM_PRIV). Irrespective of this a couple of comments on the patch itself: > --- a/xen/arch/x86/msi.c > +++ b/xen/arch/x86/msi.c > @@ -1443,6 +1443,51 @@ int pci_restore_msi_state(struct pci_dev *pdev) > return 0; > } > > +int msi_control(struct pci_dev *pdev, bool msix, bool enable) > +{ > + unsigned int cap = msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI; > + unsigned int other_cap = msix ? PCI_CAP_ID_MSI : PCI_CAP_ID_MSIX; > + uint16_t cmd; > + > + if ( !use_msi ) > + return -EOPNOTSUPP; > + > + if ( !pci_find_cap_offset(pdev->seg, > + pdev->bus, > + PCI_SLOT(pdev->devfn), > + PCI_FUNC(pdev->devfn), Please don't use PCI_SLOT() and PCI_FUNC() anymore, now that we have pdev->dev and pdev->fn. And please put multiple arguments on one line, as many as will fit. > + cap) ) > + return -ENODEV; > + > + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND); > + > + /* don't allow enabling MSI(-X) and INTx at the same time */ > + if ( enable && ! (cmd & PCI_COMMAND_INTX_DISABLE) ) Stray blank after ! . > + return -EBUSY; > + > + /* don't allow enabling both MSI and MSI-X at the same time */ > + if ( enable && find_msi_entry(pdev, -1, other_cap) ) > + return -EBUSY; Combine both if()-s, since they both check "enable"? Also - comment style again (should start with capital letter); more instances elsewhere. > +int intx_control(struct pci_dev *pdev, bool enable) > +{ > + /* don't allow enabling INTx if MSI(-X) is already enabled */ > + if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSI) ) > + return -EBUSY; > + if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX) ) > + return -EBUSY; Here even more so you want to combine both if()-s. > + pci_intx(pdev, enable); > + return 0; > +} Blank line ahead of main return statement please, and I guess another blank line ahead of the pci_intx() invocation wouldn't hurt either. > --- a/xen/arch/x86/physdev.c > +++ b/xen/arch/x86/physdev.c > @@ -662,6 +662,59 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > break; > } > > + case PHYSDEVOP_interrupt_control: { > + struct physdev_interrupt_control op; > + struct pci_dev *pdev; > + int intr_type; > + bool enable; > + > + ret = -EFAULT; > + if ( copy_from_guest(&op, arg, 1) ) > + break; > + > + ret = -EINVAL; > + if ( op.flags & ~(PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK | > + PHYSDEVOP_INTERRUPT_CONTROL_ENABLE) ) > + break; > + > + intr_type = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK; > + enable = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_ENABLE; > + > + pcidevs_lock(); > + pdev = pci_get_pdev(op.seg, op.bus, op.devfn); > + ret = -ENODEV; > + /* explicitly exclude hidden devices */ > + if ( !pdev || pdev->domain == dom_xen ) The right side should be avoided by reducing the scope of the device lookup right away, through use of pci_get_pdev_by_domain(). This will also ensure we don't exclusively rely on the XSM check below to prevent abuse of this operation. (FAOD, while pci_get_pdev_by_domain() doesn't assert that the pcidevs lock is held, you should still acquire it here. That missing ASSERT() should get added as soon as other violators of the locking requirement have been taken care of.) > + goto pci_unlock; > + > + ret = xsm_interrupt_control(XSM_DM_PRIV, > + pdev->domain, > + pdev->sbdf.sbdf, > + intr_type, > + enable); > + if ( ret ) > + goto pci_unlock; > + > + switch ( intr_type ) > + { > + case PHYSDEVOP_INTERRUPT_CONTROL_INTX: > + ret = intx_control(pdev, enable); > + break; > + case PHYSDEVOP_INTERRUPT_CONTROL_MSI: > + ret = msi_control(pdev, false, enable); > + break; > + case PHYSDEVOP_INTERRUPT_CONTROL_MSIX: > + ret = msi_control(pdev, true, enable); > + break; > + default: > + ret = -EINVAL; > + break; Indentation and blank lines between independent case blocks please. > + } > +pci_unlock: Labels indented by at least one blank, please. > --- a/xen/include/public/physdev.h > +++ b/xen/include/public/physdev.h > @@ -345,6 +345,29 @@ typedef struct physdev_dbgp_op physdev_dbgp_op_t; > DEFINE_XEN_GUEST_HANDLE(physdev_dbgp_op_t); > > /* > + * Choose which interrupt type to control. If neither MSI nor MSI-X is chosen, > + * will apply to INTx - for convenience define PHYSDEVOP_INTERRUPT_CONTROL_INTX > + * and PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK > + */ > +#define PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK 3 > +#define PHYSDEVOP_INTERRUPT_CONTROL_INTX 0 > +#define PHYSDEVOP_INTERRUPT_CONTROL_MSI 1 > +#define PHYSDEVOP_INTERRUPT_CONTROL_MSIX 2 > +/* when PHYSDEVOP_INTERRUPT_CONTROL_ENABLE not set, disable */ > +#define PHYSDEVOP_INTERRUPT_CONTROL_ENABLE 4 > + > +#define PHYSDEVOP_interrupt_control 32 > +struct physdev_interrupt_control { > + /* IN */ > + uint16_t seg; > + uint8_t bus; > + uint8_t devfn; Please re-use struct physdev_pci_device for these. Jan
On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote: > On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: > > Allow device model running in stubdomain to enable/disable INTx/MSI(-X), > > bypassing pciback. While pciback is still used to access config space > > from within stubdomain, it refuse to write to > > PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE > > in non-permissive mode. Which is the right thing to do for PV domain > > (the main use case for pciback), as PV domain should use XEN_PCI_OP_* > > commands for that. Unfortunately those commands are not good for > > stubdomain use, as they configure MSI in dom0's kernel too, which should > > not happen for HVM domain. > > Why the "for HVM domain" here? I.e. why would this be correct for > a PV domain? Besides my dislike for such a bypass (imo all of the > handling should go through pciback, or none of it) I continue to > wonder whether the problem can't be addressed by a pciback change. > And even if not, I'd still wonder whether the request shouldn't go > through pciback, to retain proper layering. Ultimately it may be > better to have even the map/unmap go through pciback (it's at > least an apparent violation of the original physdev-op model that > these two are XSM_DM_PRIV). Technically it should be possible to move this part to pciback, and in fact this is what I've considered in the first version of this series. But Roger points out on each version[1] of this series that pciback is meant to serve *PV* domains, where a PCI passthrough is a completely different different beast. In fact, I even consider that using pcifront in a Linux stubdomain as a proxy for qemu there may be a bad idea (one needs to be careful to avoid stubdomain kernel fighting with qemu about device state). Roger, what is the state of Xen internal vPCI? If handling PCI passthrough in Xen (or maybe standalone emulator), without qemu help is going to happen sooner than later (I guess not 4.13, but maybe 4.14?), then maybe this whole patch doesn't make sense as a temporary measure? Anyway, if you all agree that pciback should be the way to go, I can go that route too. In practice, it would be a flag (set by the toolstack?) allowing writes to appropriate config space registers directly (with appropriate checks, as in this patch). [1] https://lists.xenproject.org/archives/html/xen-devel/2019-08/msg00486.html
On 20.09.2019 18:02, Marek Marczykowski-Górecki wrote: > On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote: >> On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: >>> Allow device model running in stubdomain to enable/disable INTx/MSI(-X), >>> bypassing pciback. While pciback is still used to access config space >>> from within stubdomain, it refuse to write to >>> PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE >>> in non-permissive mode. Which is the right thing to do for PV domain >>> (the main use case for pciback), as PV domain should use XEN_PCI_OP_* >>> commands for that. Unfortunately those commands are not good for >>> stubdomain use, as they configure MSI in dom0's kernel too, which should >>> not happen for HVM domain. >> >> Why the "for HVM domain" here? I.e. why would this be correct for >> a PV domain? Besides my dislike for such a bypass (imo all of the >> handling should go through pciback, or none of it) I continue to >> wonder whether the problem can't be addressed by a pciback change. >> And even if not, I'd still wonder whether the request shouldn't go >> through pciback, to retain proper layering. Ultimately it may be >> better to have even the map/unmap go through pciback (it's at >> least an apparent violation of the original physdev-op model that >> these two are XSM_DM_PRIV). > > Technically it should be possible to move this part to pciback, and in > fact this is what I've considered in the first version of this series. > But Roger points out on each version[1] of this series that pciback is > meant to serve *PV* domains, where a PCI passthrough is a completely > different different beast. In fact, I even consider that using pcifront > in a Linux stubdomain as a proxy for qemu there may be a bad idea (one > needs to be careful to avoid stubdomain kernel fighting with qemu about > device state). Well, not using pciback _at all_ in this case would be another option. What I dislike is the furthering of hybrid-ness. > Anyway, if you all agree that pciback should be the way to go, I can go > that route too. In practice, it would be a flag (set by the toolstack?) > allowing writes to appropriate config space registers directly (with > appropriate checks, as in this patch). I'm afraid I don't agree: How would allowing writes to more config space registers by a stubdom be safe? Jan
On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote: > On 20.09.2019 18:02, Marek Marczykowski-Górecki wrote: > > On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote: > >> On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: > >>> Allow device model running in stubdomain to enable/disable INTx/MSI(-X), > >>> bypassing pciback. While pciback is still used to access config space > >>> from within stubdomain, it refuse to write to > >>> PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE > >>> in non-permissive mode. Which is the right thing to do for PV domain > >>> (the main use case for pciback), as PV domain should use XEN_PCI_OP_* > >>> commands for that. Unfortunately those commands are not good for > >>> stubdomain use, as they configure MSI in dom0's kernel too, which should > >>> not happen for HVM domain. > >> > >> Why the "for HVM domain" here? I.e. why would this be correct for > >> a PV domain? Besides my dislike for such a bypass (imo all of the > >> handling should go through pciback, or none of it) I continue to > >> wonder whether the problem can't be addressed by a pciback change. > >> And even if not, I'd still wonder whether the request shouldn't go > >> through pciback, to retain proper layering. Ultimately it may be > >> better to have even the map/unmap go through pciback (it's at > >> least an apparent violation of the original physdev-op model that > >> these two are XSM_DM_PRIV). > > > > Technically it should be possible to move this part to pciback, and in > > fact this is what I've considered in the first version of this series. > > But Roger points out on each version[1] of this series that pciback is > > meant to serve *PV* domains, where a PCI passthrough is a completely > > different different beast. In fact, I even consider that using pcifront > > in a Linux stubdomain as a proxy for qemu there may be a bad idea (one > > needs to be careful to avoid stubdomain kernel fighting with qemu about > > device state). > > Well, not using pciback _at all_ in this case would be another option. > What I dislike is the furthering of hybrid-ness. Ah, I see. This may be a good idea, if this type of PCI passthrough is going to stay. If we're going away from qemu towards other options mentioned in previous email, I'd say such a rework is too much work for a time limited usefulness. > > > Anyway, if you all agree that pciback should be the way to go, I can go > > that route too. In practice, it would be a flag (set by the toolstack?) > > allowing writes to appropriate config space registers directly (with > > appropriate checks, as in this patch). > > I'm afraid I don't agree: How would allowing writes to more config space > registers by a stubdom be safe? Exactly the same as in this patch: pciback would perform the same validation (prohibit enabling MSI together with INTx etc). BTW what are the risks (besides DoS) of allowing full config space access, assuming VT-d with interrupt remapping present? This sounds similar to risks of malicious device connected to some domU, right? Can such device (or a domain controlling such device) break out to Xen or dom0? Can it steal data from other domains?
On 23.09.2019 12:47, Marek Marczykowski-Górecki wrote: > On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote: >> On 20.09.2019 18:02, Marek Marczykowski-Górecki wrote: >>> Anyway, if you all agree that pciback should be the way to go, I can go >>> that route too. In practice, it would be a flag (set by the toolstack?) >>> allowing writes to appropriate config space registers directly (with >>> appropriate checks, as in this patch). >> >> I'm afraid I don't agree: How would allowing writes to more config space >> registers by a stubdom be safe? > > Exactly the same as in this patch: pciback would perform the same > validation (prohibit enabling MSI together with INTx etc). > > BTW what are the risks (besides DoS) of allowing full config space > access, assuming VT-d with interrupt remapping present? This sounds > similar to risks of malicious device connected to some domU, right? Can > such device (or a domain controlling such device) break out to Xen or > dom0? Can it steal data from other domains? There shouldn't be, but this would need proving. The direction of proof then should be the other way around (and I realize it may be [close to] impossible): Widening what guests (including stub domains) are allowed to do should be proven to add no additional risks. It shouldn't be (by example, as I imply from your question) that an actual issue needs to be pointed out. Jan
On Mon, Sep 23, 2019 at 02:05:58PM +0200, Jan Beulich wrote: > On 23.09.2019 12:47, Marek Marczykowski-Górecki wrote: > > On Mon, Sep 23, 2019 at 09:58:27AM +0200, Jan Beulich wrote: > >> On 20.09.2019 18:02, Marek Marczykowski-Górecki wrote: > >>> Anyway, if you all agree that pciback should be the way to go, I can go > >>> that route too. In practice, it would be a flag (set by the toolstack?) > >>> allowing writes to appropriate config space registers directly (with > >>> appropriate checks, as in this patch). > >> > >> I'm afraid I don't agree: How would allowing writes to more config space > >> registers by a stubdom be safe? > > > > Exactly the same as in this patch: pciback would perform the same > > validation (prohibit enabling MSI together with INTx etc). > > > > BTW what are the risks (besides DoS) of allowing full config space > > access, assuming VT-d with interrupt remapping present? This sounds > > similar to risks of malicious device connected to some domU, right? Can > > such device (or a domain controlling such device) break out to Xen or > > dom0? Can it steal data from other domains? > > There shouldn't be, but this would need proving. The direction of > proof then should be the other way around (and I realize it may be > [close to] impossible): Widening what guests (including stub > domains) are allowed to do should be proven to add no additional > risks. It shouldn't be (by example, as I imply from your question) > that an actual issue needs to be pointed out. What about this: HVM guest can already do all of this when qemu is running in dom0. So, allowing those actions when qemu is running in stubdomain should not introduce _additional_ risks.
On 23.09.2019 14:25, Marek Marczykowski-Górecki wrote: > What about this: HVM guest can already do all of this when qemu is > running in dom0. So, allowing those actions when qemu is running in > stubdomain should not introduce _additional_ risks. Well, in a way - yes. But I don't think it's right to have qemu do the direct writes it does (and I wouldn't be surprised if there were still actual issues with this model). Hence it's not going to be an improvement if this suspicious underlying design got extended to other components. Jan
On Mon, Sep 23, 2019 at 03:02:49PM +0200, Jan Beulich wrote: > On 23.09.2019 14:25, Marek Marczykowski-Górecki wrote: > > What about this: HVM guest can already do all of this when qemu is > > running in dom0. So, allowing those actions when qemu is running in > > stubdomain should not introduce _additional_ risks. > > Well, in a way - yes. But I don't think it's right to have qemu do > the direct writes it does (and I wouldn't be surprised if there > were still actual issues with this model). Hence it's not going to > be an improvement if this suspicious underlying design got > extended to other components. This sounds like any workflow involving qemu would be inferior. And I agree with that. But also I do need PCI passthrough working, so I need a solution until we have an alternative implementation. If that alternative is going to happen soon, I'm also fine with carrying patches in Qubes package (like I already do). This wouldn't be nice for the rest of the community (I believe many other Xen-based projects also carry similar patches already), but it looks like upstreaming this is taking way too much effort than it's worth for a temporary solution. So, in the next version of this series I'm going to drop this patch (and the next one).
On Fri, Sep 20, 2019 at 06:02:50PM +0200, Marek Marczykowski-Górecki wrote: > On Fri, Sep 20, 2019 at 12:10:09PM +0200, Jan Beulich wrote: > > On 14.09.2019 17:37, Marek Marczykowski-Górecki wrote: > > > Allow device model running in stubdomain to enable/disable INTx/MSI(-X), > > > bypassing pciback. While pciback is still used to access config space > > > from within stubdomain, it refuse to write to > > > PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE > > > in non-permissive mode. Which is the right thing to do for PV domain > > > (the main use case for pciback), as PV domain should use XEN_PCI_OP_* > > > commands for that. Unfortunately those commands are not good for > > > stubdomain use, as they configure MSI in dom0's kernel too, which should > > > not happen for HVM domain. > > > > Why the "for HVM domain" here? I.e. why would this be correct for > > a PV domain? Besides my dislike for such a bypass (imo all of the > > handling should go through pciback, or none of it) I continue to > > wonder whether the problem can't be addressed by a pciback change. > > And even if not, I'd still wonder whether the request shouldn't go > > through pciback, to retain proper layering. Ultimately it may be > > better to have even the map/unmap go through pciback (it's at > > least an apparent violation of the original physdev-op model that > > these two are XSM_DM_PRIV). > > Technically it should be possible to move this part to pciback, and in > fact this is what I've considered in the first version of this series. > But Roger points out on each version[1] of this series that pciback is > meant to serve *PV* domains, where a PCI passthrough is a completely > different different beast. In fact, I even consider that using pcifront > in a Linux stubdomain as a proxy for qemu there may be a bad idea (one > needs to be careful to avoid stubdomain kernel fighting with qemu about > device state). Right, it's (as show by this series) tricky to proxy HVM passthrough over the PV pciif protocol used by pcifront and pciback, because that protocol was designed for PV guests pci-passthrough. While it's indeed possible to expand the pciif protocol so it's also suitable to proxy HVM passthrough by a QEMU stubdomain that would require changes to Linux pciback at least (and to pcifront maybe?), and it's usage would need to be limited to stubdomains only to not risk expanding the attack surface of pciback. > Roger, what is the state of Xen internal vPCI? If handling PCI > passthrough in Xen (or maybe standalone emulator), without qemu help is > going to happen sooner than later (I guess not 4.13, but maybe 4.14?), > then maybe this whole patch doesn't make sense as a temporary measure? I've got an initial series posted to convert vPCI to an internal ioreq server, so it can co-exist with other ioreq servers that also trap accesses to the pci configuration space. Once that's done the main work will be to make vPCI safe for unprivileged domains. Right now vPCI is too permissive since it's designed for dom0 only. I hope 4.14 will have at least experimental code for vPCI for domUs, but I cannot guarantee anything at this point. Thanks, Roger.
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c index d630600..ecea91a 100644 --- a/xen/arch/x86/msi.c +++ b/xen/arch/x86/msi.c @@ -1443,6 +1443,51 @@ int pci_restore_msi_state(struct pci_dev *pdev) return 0; } +int msi_control(struct pci_dev *pdev, bool msix, bool enable) +{ + unsigned int cap = msix ? PCI_CAP_ID_MSIX : PCI_CAP_ID_MSI; + unsigned int other_cap = msix ? PCI_CAP_ID_MSI : PCI_CAP_ID_MSIX; + uint16_t cmd; + + if ( !use_msi ) + return -EOPNOTSUPP; + + if ( !pci_find_cap_offset(pdev->seg, + pdev->bus, + PCI_SLOT(pdev->devfn), + PCI_FUNC(pdev->devfn), + cap) ) + return -ENODEV; + + cmd = pci_conf_read16(pdev->sbdf, PCI_COMMAND); + + /* don't allow enabling MSI(-X) and INTx at the same time */ + if ( enable && ! (cmd & PCI_COMMAND_INTX_DISABLE) ) + return -EBUSY; + + /* don't allow enabling both MSI and MSI-X at the same time */ + if ( enable && find_msi_entry(pdev, -1, other_cap) ) + return -EBUSY; + + if ( msix ) + msix_set_enable(pdev, enable); + else + msi_set_enable(pdev, enable); + + return 0; +} + +int intx_control(struct pci_dev *pdev, bool enable) +{ + /* don't allow enabling INTx if MSI(-X) is already enabled */ + if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSI) ) + return -EBUSY; + if ( enable && find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX) ) + return -EBUSY; + pci_intx(pdev, enable); + return 0; +} + void __init early_msi_init(void) { if ( use_msi < 0 ) diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 3a3c158..7b71039 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -662,6 +662,59 @@ ret_t do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) break; } + case PHYSDEVOP_interrupt_control: { + struct physdev_interrupt_control op; + struct pci_dev *pdev; + int intr_type; + bool enable; + + ret = -EFAULT; + if ( copy_from_guest(&op, arg, 1) ) + break; + + ret = -EINVAL; + if ( op.flags & ~(PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK | + PHYSDEVOP_INTERRUPT_CONTROL_ENABLE) ) + break; + + intr_type = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK; + enable = op.flags & PHYSDEVOP_INTERRUPT_CONTROL_ENABLE; + + pcidevs_lock(); + pdev = pci_get_pdev(op.seg, op.bus, op.devfn); + ret = -ENODEV; + /* explicitly exclude hidden devices */ + if ( !pdev || pdev->domain == dom_xen ) + goto pci_unlock; + + ret = xsm_interrupt_control(XSM_DM_PRIV, + pdev->domain, + pdev->sbdf.sbdf, + intr_type, + enable); + if ( ret ) + goto pci_unlock; + + switch ( intr_type ) + { + case PHYSDEVOP_INTERRUPT_CONTROL_INTX: + ret = intx_control(pdev, enable); + break; + case PHYSDEVOP_INTERRUPT_CONTROL_MSI: + ret = msi_control(pdev, false, enable); + break; + case PHYSDEVOP_INTERRUPT_CONTROL_MSIX: + ret = msi_control(pdev, true, enable); + break; + default: + ret = -EINVAL; + break; + } +pci_unlock: + pcidevs_unlock(); + break; + } + default: ret = -ENOSYS; break; diff --git a/xen/arch/x86/x86_64/physdev.c b/xen/arch/x86/x86_64/physdev.c index c5a00ea..6e0e488 100644 --- a/xen/arch/x86/x86_64/physdev.c +++ b/xen/arch/x86/x86_64/physdev.c @@ -76,6 +76,10 @@ CHECK_physdev_pci_device_add CHECK_physdev_pci_device #undef xen_physdev_pci_device +#define xen_physdev_interrupt_control physdev_interrupt_control +CHECK_physdev_interrupt_control +#undef xen_physdev_interrupt_control + #define COMPAT #undef guest_handle_okay #define guest_handle_okay compat_handle_okay diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h index 10387dc..4c13e6b 100644 --- a/xen/include/asm-x86/msi.h +++ b/xen/include/asm-x86/msi.h @@ -252,5 +252,7 @@ void guest_mask_msi_irq(struct irq_desc *, bool mask); void ack_nonmaskable_msi_irq(struct irq_desc *); void end_nonmaskable_msi_irq(struct irq_desc *, u8 vector); void set_msi_affinity(struct irq_desc *, const cpumask_t *); +int msi_control(struct pci_dev *pdev, bool msix, bool enable); +int intx_control(struct pci_dev *pdev, bool enable); #endif /* __ASM_MSI_H */ diff --git a/xen/include/public/physdev.h b/xen/include/public/physdev.h index b6faf83..689c11e 100644 --- a/xen/include/public/physdev.h +++ b/xen/include/public/physdev.h @@ -345,6 +345,29 @@ typedef struct physdev_dbgp_op physdev_dbgp_op_t; DEFINE_XEN_GUEST_HANDLE(physdev_dbgp_op_t); /* + * Choose which interrupt type to control. If neither MSI nor MSI-X is chosen, + * will apply to INTx - for convenience define PHYSDEVOP_INTERRUPT_CONTROL_INTX + * and PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK + */ +#define PHYSDEVOP_INTERRUPT_CONTROL_TYPE_MASK 3 +#define PHYSDEVOP_INTERRUPT_CONTROL_INTX 0 +#define PHYSDEVOP_INTERRUPT_CONTROL_MSI 1 +#define PHYSDEVOP_INTERRUPT_CONTROL_MSIX 2 +/* when PHYSDEVOP_INTERRUPT_CONTROL_ENABLE not set, disable */ +#define PHYSDEVOP_INTERRUPT_CONTROL_ENABLE 4 + +#define PHYSDEVOP_interrupt_control 32 +struct physdev_interrupt_control { + /* IN */ + uint16_t seg; + uint8_t bus; + uint8_t devfn; + uint16_t flags; +}; +typedef struct physdev_interrupt_control physdev_interrupt_control_t; +DEFINE_XEN_GUEST_HANDLE(physdev_interrupt_control_t); + +/* * Notify that some PIRQ-bound event channels have been unmasked. * ** This command is obsolete since interface version 0x00030202 and is ** * ** unsupported by newer versions of Xen. ** diff --git a/xen/include/xlat.lst b/xen/include/xlat.lst index 95f5e55..18af663 100644 --- a/xen/include/xlat.lst +++ b/xen/include/xlat.lst @@ -104,6 +104,7 @@ ! vnuma_topology_info memory.h ? physdev_eoi physdev.h ? physdev_get_free_pirq physdev.h +? physdev_interrupt_control physdev.h ? physdev_irq physdev.h ? physdev_irq_status_query physdev.h ? physdev_manage_pci physdev.h diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index ef52bb1..5a758c5 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -514,6 +514,13 @@ static XSM_INLINE int xsm_pci_config_permission(XSM_DEFAULT_ARG struct domain *d return xsm_default_action(action, current->domain, d); } +static XSM_INLINE int xsm_interrupt_control(XSM_DEFAULT_ARG struct domain *d, uint32_t machine_bdf, + uint8_t intr_type, uint8_t enable) +{ + XSM_ASSERT_ACTION(XSM_DM_PRIV); + return xsm_default_action(action, current->domain, d); +} + static XSM_INLINE int xsm_add_to_physmap(XSM_DEFAULT_ARG struct domain *d1, struct domain *d2) { XSM_ASSERT_ACTION(XSM_TARGET); diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index e22d616..f080189 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -106,6 +106,7 @@ struct xsm_operations { int (*iomem_permission) (struct domain *d, uint64_t s, uint64_t e, uint8_t allow); int (*iomem_mapping) (struct domain *d, uint64_t s, uint64_t e, uint8_t allow); int (*pci_config_permission) (struct domain *d, uint32_t machine_bdf, uint16_t start, uint16_t end, uint8_t access); + int (*interrupt_control) (struct domain *d, uint32_t machine_bdf, uint8_t intr_type, uint8_t enable); #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_PCI) int (*get_device_group) (uint32_t machine_bdf); @@ -464,6 +465,11 @@ static inline int xsm_pci_config_permission (xsm_default_t def, struct domain *d return xsm_ops->pci_config_permission(d, machine_bdf, start, end, access); } +static inline int xsm_interrupt_control (xsm_default_t def, struct domain *d, uint32_t machine_bdf, uint8_t msix, uint8_t enable) +{ + return xsm_ops->interrupt_control(d, machine_bdf, msix, enable); +} + #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_PCI) static inline int xsm_get_device_group(xsm_default_t def, uint32_t machine_bdf) { diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index 5705e52..3080ae7 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -81,6 +81,7 @@ void __init xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, iomem_permission); set_to_dummy_if_null(ops, iomem_mapping); set_to_dummy_if_null(ops, pci_config_permission); + set_to_dummy_if_null(ops, interrupt_control); set_to_dummy_if_null(ops, get_vnumainfo); #if defined(CONFIG_HAS_PASSTHROUGH) && defined(CONFIG_HAS_PCI) diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index 791c1f6..ee2fc52 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -1083,6 +1083,29 @@ static int flask_pci_config_permission(struct domain *d, uint32_t machine_bdf, u } +static int flask_interrupt_control(struct domain *d, uint32_t machine_bdf, uint8_t type, uint8_t enable) +{ + uint32_t dsid, rsid; + int rc = -EPERM; + struct avc_audit_data ad; + uint32_t perm; + + AVC_AUDIT_DATA_INIT(&ad, DEV); + ad.device = machine_bdf; + + rc = security_device_sid(machine_bdf, &rsid); + if ( rc ) + return rc; + + rc = avc_current_has_perm(rsid, SECCLASS_RESOURCE, RESOURCE__SETUP, &ad); + if ( rc ) + return rc; + + perm = flask_iommu_resource_use_perm(); + dsid = domain_sid(d); + return avc_has_perm(dsid, rsid, SECCLASS_RESOURCE, perm, &ad); +} + static int flask_resource_plug_core(void) { return avc_current_has_perm(SECINITSID_DOMXEN, SECCLASS_RESOURCE, RESOURCE__PLUG, NULL); @@ -1800,6 +1823,7 @@ static struct xsm_operations flask_ops = { .iomem_permission = flask_iomem_permission, .iomem_mapping = flask_iomem_mapping, .pci_config_permission = flask_pci_config_permission, + .interrupt_control = flask_interrupt_control, .resource_plug_core = flask_resource_plug_core, .resource_unplug_core = flask_resource_unplug_core, diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 194d743..82eaeac 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -466,6 +466,7 @@ class resource # checked for PHYSDEVOP_restore_msi* (target PCI device) # checked for PHYSDEVOP_setup_gsi (target IRQ) # checked for PHYSDEVOP_pci_mmcfg_reserved (target xen_t) +# checked for PHYSDEVOP_interrupt_control (target PCI device) setup }
Allow device model running in stubdomain to enable/disable INTx/MSI(-X), bypassing pciback. While pciback is still used to access config space from within stubdomain, it refuse to write to PCI_MSI_FLAGS_ENABLE/PCI_MSIX_FLAGS_ENABLE/PCI_COMMAND_INTX_DISABLE in non-permissive mode. Which is the right thing to do for PV domain (the main use case for pciback), as PV domain should use XEN_PCI_OP_* commands for that. Unfortunately those commands are not good for stubdomain use, as they configure MSI in dom0's kernel too, which should not happen for HVM domain. This new physdevop is allowed only for stubdomain controlling the domain which own the device. Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> --- Changes in v3: - new patch Changes in v4: - adjust code style - s/msi_msix/msi/ - add msi_set_enable XSM hook - flatten struct physdev_msi_set_enable - add to include/xlat.lst Changes in v5: - rename to PHYSDEVOP_msi_control - combine "mode" and "enable" into "flags" - refuse to enable both MSI and MSI-X, and also to enable MSI(-X) on incapable device - disable/enable INTx when enabling/disabling MSI (?) - refuse if !use_msi - adjust flask hook to make more sense (require "setup" access on device, not on domain) - rebase on master Changes in v6: - rename to PHYSDEVOP_interrupt_control - extend with INTx control - Ensure than MSI(-X) can't be enabled together with INTx and the other MSI(-X). - deduplicate code in msi_control - explicitly refuse to operate on hidden devices - expand flags to uint16_t to avoid implicit padding I'm not sure if XSM part is correct, compile-tested only, as I'm not sure how to set the policy. --- xen/arch/x86/msi.c | 45 +++++++++++++++++++++++++- xen/arch/x86/physdev.c | 53 ++++++++++++++++++++++++++++++- xen/arch/x86/x86_64/physdev.c | 4 ++- xen/include/asm-x86/msi.h | 2 +- xen/include/public/physdev.h | 23 +++++++++++++- xen/include/xlat.lst | 1 +- xen/include/xsm/dummy.h | 7 ++++- xen/include/xsm/xsm.h | 6 +++- xen/xsm/dummy.c | 1 +- xen/xsm/flask/hooks.c | 24 ++++++++++++++- xen/xsm/flask/policy/access_vectors | 1 +- 11 files changed, 167 insertions(+)