Message ID | 20170410134313.GE21201@char.us.oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 10, 2017 at 09:43:13AM -0400, Konrad Rzeszutek Wilk wrote: [...] > > .. Except that we need some way of doing FLR and Pciback > is the one doing it. > > The right way would be to expand pciback to support the do_flr ioctl > and combine it with your patch. > OK. Does that mean the bug is in Linux and I should just take Ross's patch as-is for 4.9? Wei.
On Wed, Apr 12, 2017 at 04:46:33PM +0100, Wei Liu wrote: > On Mon, Apr 10, 2017 at 09:43:13AM -0400, Konrad Rzeszutek Wilk wrote: > [...] > > > > .. Except that we need some way of doing FLR and Pciback > > is the one doing it. > > > > The right way would be to expand pciback to support the do_flr ioctl > > and combine it with your patch. > > > > OK. Does that mean the bug is in Linux and I should just take Ross's Please don't. Or if you would like I can do Nacked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > patch as-is for 4.9? > > Wei.
On Wed, Apr 12, 2017 at 04:21:42PM -0400, Konrad Rzeszutek Wilk wrote: > On Wed, Apr 12, 2017 at 04:46:33PM +0100, Wei Liu wrote: > > On Mon, Apr 10, 2017 at 09:43:13AM -0400, Konrad Rzeszutek Wilk wrote: > > [...] > > > > > > .. Except that we need some way of doing FLR and Pciback > > > is the one doing it. > > > > > > The right way would be to expand pciback to support the do_flr ioctl > > > and combine it with your patch. > > > > > > > OK. Does that mean the bug is in Linux and I should just take Ross's > > Please don't. > > Or if you would like I can do > > Nacked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Please correct me if I'm wrong: current libxl code is broken in one way and taking that patch means to break it in another way? What is the way forward? Do you want me to wait until those kernel patches are upstreamed? Wei. > > > patch as-is for 4.9? > > > > Wei.
On Thu, Apr 13, 2017 at 09:41:03AM +0100, Wei Liu wrote: > On Wed, Apr 12, 2017 at 04:21:42PM -0400, Konrad Rzeszutek Wilk wrote: > > On Wed, Apr 12, 2017 at 04:46:33PM +0100, Wei Liu wrote: > > > On Mon, Apr 10, 2017 at 09:43:13AM -0400, Konrad Rzeszutek Wilk wrote: > > > [...] > > > > > > > > .. Except that we need some way of doing FLR and Pciback > > > > is the one doing it. > > > > > > > > The right way would be to expand pciback to support the do_flr ioctl > > > > and combine it with your patch. > > > > > > > > > > OK. Does that mean the bug is in Linux and I should just take Ross's > > > > Please don't. > > > > Or if you would like I can do > > > > Nacked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > Please correct me if I'm wrong: current libxl code is broken in one way and > taking that patch means to break it in another way? > > What is the way forward? Do you want me to wait until those kernel > patches are upstreamed? Julien was looking into moving all this reset logic into the hypervisor itself, so that ARM and PVHv2 passthrough doesn't use pciback at all. Maybe it could be done now? AFAICT this would solve the issue under discussion here. Roger.
On Thu, 13 Apr 2017, Roger Pau Monné wrote: > On Thu, Apr 13, 2017 at 09:41:03AM +0100, Wei Liu wrote: > > On Wed, Apr 12, 2017 at 04:21:42PM -0400, Konrad Rzeszutek Wilk wrote: > > > On Wed, Apr 12, 2017 at 04:46:33PM +0100, Wei Liu wrote: > > > > On Mon, Apr 10, 2017 at 09:43:13AM -0400, Konrad Rzeszutek Wilk wrote: > > > > [...] > > > > > > > > > > .. Except that we need some way of doing FLR and Pciback > > > > > is the one doing it. > > > > > > > > > > The right way would be to expand pciback to support the do_flr ioctl > > > > > and combine it with your patch. > > > > > > > > > > > > > OK. Does that mean the bug is in Linux and I should just take Ross's > > > > > > Please don't. > > > > > > Or if you would like I can do > > > > > > Nacked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > > > > > Please correct me if I'm wrong: current libxl code is broken in one way and > > taking that patch means to break it in another way? > > > > What is the way forward? Do you want me to wait until those kernel > > patches are upstreamed? > > Julien was looking into moving all this reset logic into the hypervisor itself, > so that ARM and PVHv2 passthrough doesn't use pciback at all. Maybe it could be > done now? AFAICT this would solve the issue under discussion here. That is a medium to long term goal and I wouldn't rely on it to fix an outstanding bug.
On April 13, 2017 4:41:03 AM EDT, Wei Liu <wei.liu2@citrix.com> wrote: >On Wed, Apr 12, 2017 at 04:21:42PM -0400, Konrad Rzeszutek Wilk wrote: >> On Wed, Apr 12, 2017 at 04:46:33PM +0100, Wei Liu wrote: >> > On Mon, Apr 10, 2017 at 09:43:13AM -0400, Konrad Rzeszutek Wilk >wrote: >> > [...] >> > > >> > > .. Except that we need some way of doing FLR and Pciback >> > > is the one doing it. >> > > >> > > The right way would be to expand pciback to support the do_flr >ioctl >> > > and combine it with your patch. >> > > >> > >> > OK. Does that mean the bug is in Linux and I should just take >Ross's >> >> Please don't. >> >> Or if you would like I can do >> >> Nacked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> > >Please correct me if I'm wrong: current libxl code is broken in one way >and >taking that patch means to break it in another way? > >What is the way forward? Do you want me to wait until those kernel >patches are upstreamed? Yes, as we cannot allow PCI pass through devices to not have an FLR or proper configuration register reset when a passing a device to a guest (or from one guest to another). I can figure something up next week on how to restructure the pciback module. > >Wei. > >> >> > patch as-is for 4.9? >> > >> > Wei. Thanks!
diff --git a/Documentation/ABI/testing/sysfs-driver-pciback b/Documentation/ABI/testing/sysfs-driver-pciback index 6a733bf..2d4e32f 100644 --- a/Documentation/ABI/testing/sysfs-driver-pciback +++ b/Documentation/ABI/testing/sysfs-driver-pciback @@ -11,3 +11,15 @@ Description: #echo 00:19.0-E0:2:FF > /sys/bus/pci/drivers/pciback/quirks will allow the guest to read and write to the configuration register 0x0E. + + +What: /sys/bus/pci/drivers/pciback/do_flr +Date: December 2014 +KernelVersion: 3.19 +Contact: xen-devel@lists.xenproject.org +Description: + An option to slot or bus reset an PCI device owned by + Xen PCI backend. Writing a string of DDDD:BB:DD.F will cause + the driver to perform an slot or bus reset if the device + supports. It also checks to make sure that all of the devices + under the bridge are owned by Xen PCI backend. diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index cc3cbb4..f830bf4 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -100,14 +100,9 @@ static void pcistub_device_release(struct kref *kref) xen_unregister_device_domain_owner(dev); - /* Call the reset function which does not take lock as this - * is called from "unbind" which takes a device_lock mutex. - */ - __pci_reset_function_locked(dev); + /* Reset is done by the toolstack by using 'do_flr' on the SysFS. */ if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); - else - pci_restore_state(dev); if (dev->msix_cap) { struct physdev_pci_device ppdev = { @@ -123,9 +118,6 @@ static void pcistub_device_release(struct kref *kref) err); } - /* Disable the device */ - xen_pcibk_reset_device(dev); - kfree(dev_data); pci_set_drvdata(dev, NULL); @@ -242,6 +234,87 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev, return found_dev; } +struct wrapper_args { + struct pci_dev *dev; + int in_use; +}; + +static int pcistub_pci_walk_wrapper(struct pci_dev *dev, void *data) +{ + struct pcistub_device *psdev, *found_psdev = NULL; + struct wrapper_args *arg = data; + unsigned long flags; + + spin_lock_irqsave(&pcistub_devices_lock, flags); + list_for_each_entry(psdev, &pcistub_devices, dev_list) { + if (psdev->dev == dev) { + found_psdev = psdev; + if (psdev->pdev) + arg->in_use++; + break; + } + } + spin_unlock_irqrestore(&pcistub_devices_lock, flags); + dev_dbg(&dev->dev, "%s\n", found_psdev ? "OK" : "not owned by us!"); + + if (!found_psdev) + arg->dev = dev; + return found_psdev ? 0 : 1; +} + +static int pcistub_reset_pci_dev(struct pci_dev *dev) +{ + struct xen_pcibk_dev_data *dev_data; + struct wrapper_args arg = { .dev = NULL, .in_use = 0 }; + bool slot = false, bus = false; + int rc; + + if (!dev) + return -EINVAL; + + if (!pci_probe_reset_slot(dev->slot)) + slot = true; + else if (!pci_probe_reset_bus(dev->bus)) { + /* We won't attempt to reset a root bridge. */ + if (!pci_is_root_bus(dev->bus)) + bus = true; + } + dev_dbg(&dev->dev, "resetting (FLR, D3, %s %s) the device\n", + slot ? "slot" : "", bus ? "bus" : ""); + + pci_walk_bus(dev->bus, pcistub_pci_walk_wrapper, &arg); + + if (arg.in_use) + dev_err(&dev->dev, "is in use!\n"); + + /* + * Takes the PCI lock. OK to do it as we are never called + * from 'unbind' state and don't deadlock. + */ + dev_data = pci_get_drvdata(dev); + if (!pci_load_saved_state(dev, dev_data->pci_saved_state)) + pci_restore_state(dev); + + pci_reset_function(dev); + + /* This disables the device. */ + xen_pcibk_reset_device(dev); + + /* And cleanup up our emulated fields. */ + xen_pcibk_config_reset_dev(dev); + + if (!bus && !slot) + return 0; + + /* All slots or devices under the bus should be part of pcistub! */ + if (arg.dev) { + dev_err(&dev->dev, "depends on: %s!\n", pci_name(arg.dev)); + return -EBUSY; + } + return slot ? pci_try_reset_slot(dev->slot) : + pci_try_reset_bus(dev->bus); +} + /* * Called when: * - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device @@ -277,8 +350,9 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * pcistub and xen_pcibk when AER is in processing */ down_write(&pcistub_sem); - /* Cleanup our device - * (so it's ready for the next domain) + /* Cleanup our device (so it's ready for the next domain) + * That is the job of the toolstack which has to call 'do_flr' before + * providing the PCI device to a guest (see pcistub_do_flr). */ device_lock_assert(&dev->dev); __pci_reset_function_locked(dev); @@ -1389,6 +1463,29 @@ static ssize_t permissive_show(struct device_driver *drv, char *buf) static DRIVER_ATTR(permissive, S_IRUSR | S_IWUSR, permissive_show, permissive_add); +static ssize_t pcistub_do_flr(struct device_driver *drv, const char *buf, + size_t count) +{ + int domain, bus, slot, func; + int err; + struct pcistub_device *psdev; + + err = str_to_slot(buf, &domain, &bus, &slot, &func); + if (err) + goto out; + + psdev = pcistub_device_find(domain, bus, slot, func); + if (psdev) { + err = pcistub_reset_pci_dev(psdev->dev); + pcistub_device_put(psdev); + } else + err = -ENODEV; +out: + if (!err) + err = count; + return err; +} +static DRIVER_ATTR(do_flr, S_IWUSR, NULL, pcistub_do_flr); static void pcistub_exit(void) { driver_remove_file(&xen_pcibk_pci_driver.driver, &driver_attr_new_slot); @@ -1402,6 +1499,8 @@ static void pcistub_exit(void) &driver_attr_irq_handlers); driver_remove_file(&xen_pcibk_pci_driver.driver, &driver_attr_irq_handler_state); + driver_remove_file(&xen_pcibk_pci_driver.driver, + &driver_attr_do_flr); pci_unregister_driver(&xen_pcibk_pci_driver); } @@ -1495,6 +1594,9 @@ static int __init pcistub_init(void) if (!err) err = driver_create_file(&xen_pcibk_pci_driver.driver, &driver_attr_irq_handler_state); + if (!err) + err = driver_create_file(&xen_pcibk_pci_driver.driver, + &driver_attr_do_flr); if (err) pcistub_exit();