diff mbox

[1/3] libxl: attach xen-pciback only to PV domains

Message ID 1476755613-3921-2-git-send-email-marmarek@invisiblethingslab.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marek Marczykowski-Górecki Oct. 18, 2016, 1:53 a.m. UTC
HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback is used only in PV domains.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself is not only useless, but also may prevent
attaching xen-pciback to the stubdomain, effectively breaking PCI
passthrough.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
---
 tools/libxl/libxl_pci.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Konrad Rzeszutek Wilk Oct. 18, 2016, 8:52 p.m. UTC | #1
On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> HVM domains use IOMMU and device model assistance for communicating with
> PCI devices, xen-pcifront/pciback is used only in PV domains.
> When HVM domain has device model in stubdomain, attaching xen-pciback to
> the target domain itself is not only useless, but also may prevent
> attaching xen-pciback to the stubdomain, effectively breaking PCI
> passthrough.

This has the consequence that the "reset" of the device that
pciback does will no longer be done.

That is the FLR functionality will not be exercised anymore.

> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
>  tools/libxl/libxl_pci.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 6f8f49c..2ae1bc4 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1111,7 +1111,7 @@ out:
>          }
>      }
>  
> -    if (!starting)
> +    if (!starting && !hvm)
>          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
>      else
>          rc = 0;
> @@ -1306,7 +1306,8 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
>          }
>      }
>  
> -    if (d_config->num_pcidevs > 0) {
> +    if (d_config->num_pcidevs > 0
> +            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
>          rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
>              d_config->num_pcidevs);
>          if (rc < 0) {
> -- 
> 2.5.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Marek Marczykowski-Górecki Oct. 18, 2016, 9:03 p.m. UTC | #2
On Tue, Oct 18, 2016 at 04:52:29PM -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> > HVM domains use IOMMU and device model assistance for communicating with
> > PCI devices, xen-pcifront/pciback is used only in PV domains.
> > When HVM domain has device model in stubdomain, attaching xen-pciback to
> > the target domain itself is not only useless, but also may prevent
> > attaching xen-pciback to the stubdomain, effectively breaking PCI
> > passthrough.
> 
> This has the consequence that the "reset" of the device that
> pciback does will no longer be done.
> 
> That is the FLR functionality will not be exercised anymore.

Are you sure about that? libxl__device_pci_add calls
libxl__device_pci_reset, regardless of my patch.

> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> >  tools/libxl/libxl_pci.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 6f8f49c..2ae1bc4 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -1111,7 +1111,7 @@ out:
> >          }
> >      }
> >  
> > -    if (!starting)
> > +    if (!starting && !hvm)
> >          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
> >      else
> >          rc = 0;
> > @@ -1306,7 +1306,8 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
> >          }
> >      }
> >  
> > -    if (d_config->num_pcidevs > 0) {
> > +    if (d_config->num_pcidevs > 0
> > +            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
> >          rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
> >              d_config->num_pcidevs);
> >          if (rc < 0) {
> > -- 
> > 2.5.5
> > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
Wei Liu Oct. 19, 2016, 9:37 a.m. UTC | #3
On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> HVM domains use IOMMU and device model assistance for communicating with
> PCI devices, xen-pcifront/pciback is used only in PV domains.

This bit of description is in line with my understanding of how PCI
passthrough works.

> When HVM domain has device model in stubdomain, attaching xen-pciback to
> the target domain itself is not only useless, but also may prevent
> attaching xen-pciback to the stubdomain, effectively breaking PCI
> passthrough.
> 
> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> ---
>  tools/libxl/libxl_pci.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> index 6f8f49c..2ae1bc4 100644
> --- a/tools/libxl/libxl_pci.c
> +++ b/tools/libxl/libxl_pci.c
> @@ -1111,7 +1111,7 @@ out:
>          }
>      }
>  
> -    if (!starting)
> +    if (!starting && !hvm)
>          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
>      else
>          rc = 0;
> @@ -1306,7 +1306,8 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
>          }
>      }
>  
> -    if (d_config->num_pcidevs > 0) {
> +    if (d_config->num_pcidevs > 0
> +            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {

Please move the indentation forward.

>          rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
>              d_config->num_pcidevs);
>          if (rc < 0) {
> -- 
> 2.5.5
>
Konrad Rzeszutek Wilk Oct. 19, 2016, 8:46 p.m. UTC | #4
On Wed, Oct 19, 2016 at 10:37:52AM +0100, Wei Liu wrote:
> On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> > HVM domains use IOMMU and device model assistance for communicating with
> > PCI devices, xen-pcifront/pciback is used only in PV domains.
> 
> This bit of description is in line with my understanding of how PCI
> passthrough works.

Kind of. Pciback is also used to "own" the PCI devices. And in fact
they do an important job of resetting the PCI device when the
device is "bind" to pciback:

echo <Bdf> > bind

And .. this is the important part - when device changes ownership.
That is when you disconnect it from one guest and assign to another.
You need to reset the device in between. The code that calls
the pci_reset_function is called by:

}                                                                               
                                                                                
/*                                                                              
 * Called when:                                                                 
 *  - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device
 *  - XenBus state has been disconnected (guest shutdown). See xen_pcibk_xenbus_remove
 *  - 'echo BDF > unbind' on pciback module with no guest attached. See pcistub_remove
 *  - 'echo BDF > unbind' with a guest still using it. See pcistub_remove       
 *                                                                              
 *  As such we have to be careful.                                              
 *                                                                              
 *  To make this easier, the caller has to hold the device lock.                
 */                                                                             
void pcistub_put_pci_dev(struct pci_dev *dev)

The first two are done when XenStore 'pci' entries are active - which
this patch will remove and introduce a potential security problem.

Unless libxl does an 'unbind' followed by an 'bind'?

> 
> > When HVM domain has device model in stubdomain, attaching xen-pciback to
> > the target domain itself is not only useless, but also may prevent
> > attaching xen-pciback to the stubdomain, effectively breaking PCI
> > passthrough.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > ---
> >  tools/libxl/libxl_pci.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > index 6f8f49c..2ae1bc4 100644
> > --- a/tools/libxl/libxl_pci.c
> > +++ b/tools/libxl/libxl_pci.c
> > @@ -1111,7 +1111,7 @@ out:
> >          }
> >      }
> >  
> > -    if (!starting)
> > +    if (!starting && !hvm)
> >          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
> >      else
> >          rc = 0;
> > @@ -1306,7 +1306,8 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
> >          }
> >      }
> >  
> > -    if (d_config->num_pcidevs > 0) {
> > +    if (d_config->num_pcidevs > 0
> > +            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
> 
> Please move the indentation forward.
> 
> >          rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
> >              d_config->num_pcidevs);
> >          if (rc < 0) {
> > -- 
> > 2.5.5
> > 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
Marek Marczykowski-Górecki Oct. 19, 2016, 10:42 p.m. UTC | #5
On Wed, Oct 19, 2016 at 04:46:26PM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 19, 2016 at 10:37:52AM +0100, Wei Liu wrote:
> > On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> > > HVM domains use IOMMU and device model assistance for communicating with
> > > PCI devices, xen-pcifront/pciback is used only in PV domains.
> > 
> > This bit of description is in line with my understanding of how PCI
> > passthrough works.
> 
> Kind of. Pciback is also used to "own" the PCI devices. And in fact
> they do an important job of resetting the PCI device when the
> device is "bind" to pciback:
> 
> echo <Bdf> > bind

This part is still done.

> And .. this is the important part - when device changes ownership.
> That is when you disconnect it from one guest and assign to another.
> You need to reset the device in between. The code that calls
> the pci_reset_function is called by:
> 
> }                                                                               
>                                                                                 
> /*                                                                              
>  * Called when:                                                                 
>  *  - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device
>  *  - XenBus state has been disconnected (guest shutdown). See xen_pcibk_xenbus_remove

But this, in case of HVM without stubdomain, is not.

>  *  - 'echo BDF > unbind' on pciback module with no guest attached. See pcistub_remove
>  *  - 'echo BDF > unbind' with a guest still using it. See pcistub_remove       
>  *                                                                              
>  *  As such we have to be careful.                                              
>  *                                                                              
>  *  To make this easier, the caller has to hold the device lock.                
>  */                                                                             
> void pcistub_put_pci_dev(struct pci_dev *dev)
> 
> The first two are done when XenStore 'pci' entries are active - which
> this patch will remove and introduce a potential security problem.
> 
> Unless libxl does an 'unbind' followed by an 'bind'?

What about libxl__device_pci_reset, which is called (at least) before
attaching device to some domain, even after my patch and even if the
device is already bound to pciback. It tries to reset the device using
'reset' entry in sysfs. I see this isn't available for some devices -
can pci_reset_function do any better?


> 
> > 
> > > When HVM domain has device model in stubdomain, attaching xen-pciback to
> > > the target domain itself is not only useless, but also may prevent
> > > attaching xen-pciback to the stubdomain, effectively breaking PCI
> > > passthrough.
> > > 
> > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > ---
> > >  tools/libxl/libxl_pci.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > index 6f8f49c..2ae1bc4 100644
> > > --- a/tools/libxl/libxl_pci.c
> > > +++ b/tools/libxl/libxl_pci.c
> > > @@ -1111,7 +1111,7 @@ out:
> > >          }
> > >      }
> > >  
> > > -    if (!starting)
> > > +    if (!starting && !hvm)
> > >          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
> > >      else
> > >          rc = 0;
> > > @@ -1306,7 +1306,8 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
> > >          }
> > >      }
> > >  
> > > -    if (d_config->num_pcidevs > 0) {
> > > +    if (d_config->num_pcidevs > 0
> > > +            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
> > 
> > Please move the indentation forward.
> > 
> > >          rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
> > >              d_config->num_pcidevs);
> > >          if (rc < 0) {
> > > -- 
> > > 2.5.5
> > > 
> > 
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > https://lists.xen.org/xen-devel
Konrad Rzeszutek Wilk Oct. 25, 2016, 1:10 p.m. UTC | #6
On Thu, Oct 20, 2016 at 12:42:33AM +0200, Marek Marczykowski-Górecki wrote:
> On Wed, Oct 19, 2016 at 04:46:26PM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Oct 19, 2016 at 10:37:52AM +0100, Wei Liu wrote:
> > > On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> > > > HVM domains use IOMMU and device model assistance for communicating with
> > > > PCI devices, xen-pcifront/pciback is used only in PV domains.
> > > 
> > > This bit of description is in line with my understanding of how PCI
> > > passthrough works.
> > 
> > Kind of. Pciback is also used to "own" the PCI devices. And in fact
> > they do an important job of resetting the PCI device when the
> > device is "bind" to pciback:
> > 
> > echo <Bdf> > bind
> 
> This part is still done.
> 
> > And .. this is the important part - when device changes ownership.
> > That is when you disconnect it from one guest and assign to another.
> > You need to reset the device in between. The code that calls
> > the pci_reset_function is called by:
> > 
> > }                                                                               
> >                                                                                 
> > /*                                                                              
> >  * Called when:                                                                 
> >  *  - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device
> >  *  - XenBus state has been disconnected (guest shutdown). See xen_pcibk_xenbus_remove
> 
> But this, in case of HVM without stubdomain, is not.
> 
> >  *  - 'echo BDF > unbind' on pciback module with no guest attached. See pcistub_remove
> >  *  - 'echo BDF > unbind' with a guest still using it. See pcistub_remove       
> >  *                                                                              
> >  *  As such we have to be careful.                                              
> >  *                                                                              
> >  *  To make this easier, the caller has to hold the device lock.                
> >  */                                                                             
> > void pcistub_put_pci_dev(struct pci_dev *dev)
> > 
> > The first two are done when XenStore 'pci' entries are active - which
> > this patch will remove and introduce a potential security problem.
> > 
> > Unless libxl does an 'unbind' followed by an 'bind'?
> 
> What about libxl__device_pci_reset, which is called (at least) before
> attaching device to some domain, even after my patch and even if the
> device is already bound to pciback. It tries to reset the device using
> 'reset' entry in sysfs. I see this isn't available for some devices -
> can pci_reset_function do any better?

My vague recollection was that it tried to do it but it aborted
earlier due to holding locks (dev_lock is held when you do any
operation on the SysFS). But I may be forgetting the details.

I need to look in the Linux code to confirm what the tricky part was.

> 
> 
> > 
> > > 
> > > > When HVM domain has device model in stubdomain, attaching xen-pciback to
> > > > the target domain itself is not only useless, but also may prevent
> > > > attaching xen-pciback to the stubdomain, effectively breaking PCI
> > > > passthrough.
> > > > 
> > > > Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > > ---
> > > >  tools/libxl/libxl_pci.c | 5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
> > > > index 6f8f49c..2ae1bc4 100644
> > > > --- a/tools/libxl/libxl_pci.c
> > > > +++ b/tools/libxl/libxl_pci.c
> > > > @@ -1111,7 +1111,7 @@ out:
> > > >          }
> > > >      }
> > > >  
> > > > -    if (!starting)
> > > > +    if (!starting && !hvm)
> > > >          rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
> > > >      else
> > > >          rc = 0;
> > > > @@ -1306,7 +1306,8 @@ static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
> > > >          }
> > > >      }
> > > >  
> > > > -    if (d_config->num_pcidevs > 0) {
> > > > +    if (d_config->num_pcidevs > 0
> > > > +            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
> > > 
> > > Please move the indentation forward.
> > > 
> > > >          rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
> > > >              d_config->num_pcidevs);
> > > >          if (rc < 0) {
> > > > -- 
> > > > 2.5.5
> > > > 
> > > 
> > > _______________________________________________
> > > Xen-devel mailing list
> > > Xen-devel@lists.xen.org
> > > https://lists.xen.org/xen-devel
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
Marek Marczykowski-Górecki Oct. 25, 2016, 7:22 p.m. UTC | #7
On Tue, Oct 25, 2016 at 09:10:02AM -0400, Konrad Rzeszutek Wilk wrote:
> On Thu, Oct 20, 2016 at 12:42:33AM +0200, Marek Marczykowski-Górecki wrote:
> > On Wed, Oct 19, 2016 at 04:46:26PM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Wed, Oct 19, 2016 at 10:37:52AM +0100, Wei Liu wrote:
> > > > On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
> > > > > HVM domains use IOMMU and device model assistance for communicating with
> > > > > PCI devices, xen-pcifront/pciback is used only in PV domains.
> > > > 
> > > > This bit of description is in line with my understanding of how PCI
> > > > passthrough works.
> > > 
> > > Kind of. Pciback is also used to "own" the PCI devices. And in fact
> > > they do an important job of resetting the PCI device when the
> > > device is "bind" to pciback:
> > > 
> > > echo <Bdf> > bind
> > 
> > This part is still done.
> > 
> > > And .. this is the important part - when device changes ownership.
> > > That is when you disconnect it from one guest and assign to another.
> > > You need to reset the device in between. The code that calls
> > > the pci_reset_function is called by:
> > > 
> > > }                                                                               
> > >                                                                                 
> > > /*                                                                              
> > >  * Called when:                                                                 
> > >  *  - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device
> > >  *  - XenBus state has been disconnected (guest shutdown). See xen_pcibk_xenbus_remove
> > 
> > But this, in case of HVM without stubdomain, is not.
> > 
> > >  *  - 'echo BDF > unbind' on pciback module with no guest attached. See pcistub_remove
> > >  *  - 'echo BDF > unbind' with a guest still using it. See pcistub_remove       
> > >  *                                                                              
> > >  *  As such we have to be careful.                                              
> > >  *                                                                              
> > >  *  To make this easier, the caller has to hold the device lock.                
> > >  */                                                                             
> > > void pcistub_put_pci_dev(struct pci_dev *dev)
> > > 
> > > The first two are done when XenStore 'pci' entries are active - which
> > > this patch will remove and introduce a potential security problem.
> > > 
> > > Unless libxl does an 'unbind' followed by an 'bind'?
> > 
> > What about libxl__device_pci_reset, which is called (at least) before
> > attaching device to some domain, even after my patch and even if the
> > device is already bound to pciback. It tries to reset the device using
> > 'reset' entry in sysfs. I see this isn't available for some devices -
> > can pci_reset_function do any better?
> 
> My vague recollection was that it tried to do it but it aborted
> earlier due to holding locks (dev_lock is held when you do any
> operation on the SysFS). But I may be forgetting the details.
> 
> I need to look in the Linux code to confirm what the tricky part was.

Thanks. This is the last thing holding me from sending v2.

Anyway, if attaching xen-pciback to /something/ is needed, how should it
look? We have 3 cases:
1. PV - without qemu
2. HVM - with qemu in dom0
3. HVM - with qemu in stubdomain
And soon there will be 4th: PVH - without qemu

For 1 and 4 the device should be attached (in terms of xenstore) to the
target domain, as xen-pcifront (or equivalent) running there will be
used. BTW is that true for PVHv2?
For 3 - it should be attached to stubdomain (which is the case).
The question is what about 2 - should it be attached to the target domain,
even though it will not be used?
Andrew Cooper Oct. 25, 2016, 7:42 p.m. UTC | #8
On 25/10/16 20:22, Marek Marczykowski-Górecki wrote:
> On Tue, Oct 25, 2016 at 09:10:02AM -0400, Konrad Rzeszutek Wilk wrote:
>> On Thu, Oct 20, 2016 at 12:42:33AM +0200, Marek Marczykowski-Górecki wrote:
>>> On Wed, Oct 19, 2016 at 04:46:26PM -0400, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Oct 19, 2016 at 10:37:52AM +0100, Wei Liu wrote:
>>>>> On Tue, Oct 18, 2016 at 03:53:31AM +0200, Marek Marczykowski-Górecki wrote:
>>>>>> HVM domains use IOMMU and device model assistance for communicating with
>>>>>> PCI devices, xen-pcifront/pciback is used only in PV domains.
>>>>> This bit of description is in line with my understanding of how PCI
>>>>> passthrough works.
>>>> Kind of. Pciback is also used to "own" the PCI devices. And in fact
>>>> they do an important job of resetting the PCI device when the
>>>> device is "bind" to pciback:
>>>>
>>>> echo <Bdf> > bind
>>> This part is still done.
>>>
>>>> And .. this is the important part - when device changes ownership.
>>>> That is when you disconnect it from one guest and assign to another.
>>>> You need to reset the device in between. The code that calls
>>>> the pci_reset_function is called by:
>>>>
>>>> }                                                                               
>>>>                                                                                 
>>>> /*                                                                              
>>>>  * Called when:                                                                 
>>>>  *  - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device
>>>>  *  - XenBus state has been disconnected (guest shutdown). See xen_pcibk_xenbus_remove
>>> But this, in case of HVM without stubdomain, is not.
>>>
>>>>  *  - 'echo BDF > unbind' on pciback module with no guest attached. See pcistub_remove
>>>>  *  - 'echo BDF > unbind' with a guest still using it. See pcistub_remove       
>>>>  *                                                                              
>>>>  *  As such we have to be careful.                                              
>>>>  *                                                                              
>>>>  *  To make this easier, the caller has to hold the device lock.                
>>>>  */                                                                             
>>>> void pcistub_put_pci_dev(struct pci_dev *dev)
>>>>
>>>> The first two are done when XenStore 'pci' entries are active - which
>>>> this patch will remove and introduce a potential security problem.
>>>>
>>>> Unless libxl does an 'unbind' followed by an 'bind'?
>>> What about libxl__device_pci_reset, which is called (at least) before
>>> attaching device to some domain, even after my patch and even if the
>>> device is already bound to pciback. It tries to reset the device using
>>> 'reset' entry in sysfs. I see this isn't available for some devices -
>>> can pci_reset_function do any better?
>> My vague recollection was that it tried to do it but it aborted
>> earlier due to holding locks (dev_lock is held when you do any
>> operation on the SysFS). But I may be forgetting the details.
>>
>> I need to look in the Linux code to confirm what the tricky part was.
> Thanks. This is the last thing holding me from sending v2.
>
> Anyway, if attaching xen-pciback to /something/ is needed, how should it
> look? We have 3 cases:
> 1. PV - without qemu
> 2. HVM - with qemu in dom0
> 3. HVM - with qemu in stubdomain
> And soon there will be 4th: PVH - without qemu
>
> For 1 and 4 the device should be attached (in terms of xenstore) to the
> target domain, as xen-pcifront (or equivalent) running there will be
> used. BTW is that true for PVHv2?
> For 3 - it should be attached to stubdomain (which is the case).
> The question is what about 2 - should it be attached to the target domain,
> even though it will not be used?

PVH(v2) is a little complicated.  For dom0 support, there are some bits
of basic bridge emulation moving into the hypervisor so qemu isn't
required at all.  In practice, this means that SRIOV passthrough to
plain PVH(v2) domU's will also work without qemu.

There are specific plans to not use pcifront in PVH(v2) guests, making
it closer to how real hardware works.  If however there are complicated
bits of faking up required (e.g. Graphics IO-bars so windows doesn't
refuse to load the driver), then that will be better relegated to a very
small ioreq server driver in dom0, similar to how demu currently works.

As for the other cases.  The one and only legitimate case where a guest
can find any information about its pci devices in xenstore is PV
guests.  HVM guests must under no circumstance be in a position to use
pci-front.  Both pci-back and qemu have their own model of PCI state,
and an HVM guest must not be able to do thinks like half an update via
one method and another half via the other.

It is unfortunate that xen-pciback has dual unrelated functionality. 
The "binding to arbitrary devices" should be split out into a separate
device, leaving xen-pciback as only the back half of the shared protocol.

~Andrew
diff mbox

Patch

diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index 6f8f49c..2ae1bc4 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -1111,7 +1111,7 @@  out:
         }
     }
 
-    if (!starting)
+    if (!starting && !hvm)
         rc = libxl__device_pci_add_xenstore(gc, domid, pcidev, starting);
     else
         rc = 0;
@@ -1306,7 +1306,8 @@  static void libxl__add_pcidevs(libxl__egc *egc, libxl__ao *ao, uint32_t domid,
         }
     }
 
-    if (d_config->num_pcidevs > 0) {
+    if (d_config->num_pcidevs > 0
+            && d_config->c_info.type == LIBXL_DOMAIN_TYPE_PV) {
         rc = libxl__create_pci_backend(gc, domid, d_config->pcidevs,
             d_config->num_pcidevs);
         if (rc < 0) {