From patchwork Mon Apr 10 13:43:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Konrad Rzeszutek Wilk X-Patchwork-Id: 9672685 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id EDE7B600CB for ; Mon, 10 Apr 2017 13:46:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFAE21FE82 for ; Mon, 10 Apr 2017 13:46:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D40DB28449; Mon, 10 Apr 2017 13:46:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8E6271FE82 for ; Mon, 10 Apr 2017 13:46:36 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cxZbY-0000yG-It; Mon, 10 Apr 2017 13:43:40 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cxZbW-0000yA-VR for xen-devel@lists.xen.org; Mon, 10 Apr 2017 13:43:39 +0000 Received: from [85.158.139.211] by server-10.bemta-5.messagelabs.com id 4A/E8-02186-A0C8BE85; Mon, 10 Apr 2017 13:43:38 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrPKsWRWlGSWpSXmKPExsUyZ7p8oC5Lz+s Ig83HDCyWfFzM4sDocXT3b6YAxijWzLyk/IoE1oyupa3MBd9WMVbcOHCFsYHxbzdjFyMXh5DA RCaJ55uOs0E43xglJh9eCuVsZJTYdWgjO4TTzSjxdv0R5i5GTiCnSOLf5+mMIDaLgKrE9LUbg GwODjYBE4k3qxxBwiICuhLPFjwDG8QsMI9Z4kzbdiYQR1iggVHiwsLFLCBVvALmEvcvzGOG2N DBKNG8/TY7REJQ4uTMJywgU5kFrCR2TvaGMKUllv/jAKngFLCTmLD5CROILSqgLLH71l6w25g F5CW2v50DZksIGEp83riUeQKj8CwkQ2chDJ2FMHQWkmYIW13iz7xLzBhKtCWWLXwNVaIlcePf SyYIO1vi/YS/zAsYOVcxqhenFpWlFuka6iUVZaZnlOQmZuboGhqY6uWmFhcnpqfmJCYV6yXn5 25iBEZePQMD4w7Gpl7nQ4ySHExKorzplq8jhPiS8lMqMxKLM+KLSnNSiw8xynBwKEnwVnUB5Q SLUtNTK9Iyc4ApACYtwcGjJMJrApLmLS5IzC3OTIdInWLU5Xi39MN7JiGWvPy8VClx3q8gRQI gRRmleXAjYOnoEqOslDAvIwMDgxBPQWpRbmYJqvwrRnEORiVh3v0gU3gy80rgNr0COoIJ6Igz u16CHFGSiJCSamBscF11Ko3nx4o8iRXKHK5H2HkMV09+ElnOpHYgs4hXiddVz2Zn15HX1b91S 1+cL5h1UOMjr+H5hXxTTma+UZOJ7GTafnJtvpOX9IZu/tM7zIMmVkbVbd8gPW9nxEFJX54Y04 QPh2+WPfWLnH3u9hr+dxWN9yvj29o3VddbN/Rs5l2wdeLWUxlKLMUZiYZazEXFiQC+FFNaQgM AAA== X-Env-Sender: konrad.wilk@oracle.com X-Msg-Ref: server-14.tower-206.messagelabs.com!1491831810!53956270!1 X-Originating-IP: [156.151.31.81] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogMTU2LjE1MS4zMS44MSA9PiAyODgzMzk=\n X-StarScan-Received: X-StarScan-Version: 9.4.12; banners=-,-,- X-VirusChecked: Checked Received: (qmail 1422 invoked from network); 10 Apr 2017 13:43:32 -0000 Received: from userp1040.oracle.com (HELO userp1040.oracle.com) (156.151.31.81) by server-14.tower-206.messagelabs.com with DHE-RSA-AES256-GCM-SHA384 encrypted SMTP; 10 Apr 2017 13:43:32 -0000 Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v3ADhImM018352 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 10 Apr 2017 13:43:19 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id v3ADhHnc021994 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 10 Apr 2017 13:43:18 GMT Received: from abhmp0011.oracle.com (abhmp0011.oracle.com [141.146.116.17]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id v3ADhFIH027435; Mon, 10 Apr 2017 13:43:16 GMT Received: from char.us.oracle.com (/10.137.176.158) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 10 Apr 2017 06:43:14 -0700 Received: by char.us.oracle.com (Postfix, from userid 1000) id A98BD6A09E3; Mon, 10 Apr 2017 09:43:13 -0400 (EDT) Date: Mon, 10 Apr 2017 09:43:13 -0400 From: Konrad Rzeszutek Wilk To: Andrew Cooper Message-ID: <20170410134313.GE21201@char.us.oracle.com> References: <9B58885955C6DD468CCDB27E993605EE0BA6DC19@SZXEMI502-MBS.china.huawei.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.8.0 (2017-02-23) X-Source-IP: userv0022.oracle.com [156.151.31.74] Cc: "Xuquan \(Quan Xu\)" , "sstabellini@kernel.org" , Wei Liu , Yanqiangjun , Ian Jackson , lidonglin , "xen-devel@lists.xen.org" , Fanhenglong , "dengkai \(A\)" Subject: Re: [Xen-devel] PCI passthrough will cause unknown device appearance in device manager of Windows OS via xenstore keys X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Mon, Apr 10, 2017 at 01:46:20PM +0100, Andrew Cooper wrote: > On 10/04/17 12:24, lidonglin wrote: > > > > Hi all: > > > > I have one question about PCI passthrough. I found > > that if I created a VM with host pci device(cfg file as below), there > > were some xenstore keys exsiting in /local/domain/0 and > > /local/domain/DomID/. Besides I found one unknown device in device > > manager of Windows OS with Class ID FF80 from vendor ID 5853. My > > questions are as below: > > > > > > > > 1. Why do we create frontend and backend key pairs in xenstore? > > I think Passthrough is not PV, was it possible that we just used > > these keys to record something? > > > > 2. After review the code, I think backend keys are used to > > record state of hostdev, some codes will query something using these > > keys. But for frontend keys, can we delete them? > > > > 3. if frontend keys exist in xenstore, then unknown device will > > appear in device manager. Can we fix this? For an obsessive, he > > can’t stand for this. > > > > This is a libxl bug which has been reported before, but nothing happened. > > libxl must not start a PV pci-back when doing HVM PCI passthrough using > qemu, because absolutely nothing good can come of having two different > ways of poking at PCI state. .. Except that we need some way of doing FLR and Pciback is the one doing it. The right way would be to expand pciback to support the do_flr ioctl and combine it with your patch. > > The patch, unchanged from before, is > https://github.com/xenserver/xen-4.7.pg/blob/master/master/0001-libxl-Don-t-insert-PCI-device-into-xenstore-for-HVM-.patch Attaching the flr one (inline) and attachemnt From 314a0ff66af9380de4cd4c8a7f4f648c6f593bb8 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Wed, 3 Dec 2014 15:47:02 -0500 Subject: [PATCH] xen/pciback: Implement PCI reset slot or bus with 'do_flr' SysFS attribute The life-cycle of a PCI device in Xen pciback is complex and is constrained by the PCI generic locking mechanism. It starts with the device being binded to us - for which we do a device function reset (and done via SysFS so the PCI lock is held) If the device is unbinded from us - we also do a function reset (also done via SysFS so the PCI lock is held). If the device is un-assigned from a guest - we do a function reset (no PCI lock). All on the individual PCI function level (so bus:device:function). Unfortunatly a function reset is not adequate for certain PCIe devices. The reset for an individual PCI function "means device must support FLR (PCIe or AF), PM reset on D3hot->D0 device specific reset, or be a singleton device on a bus a secondary bus reset. FLR does not have widespread support, reset is not very reliable, and bus topology is dictated by the and device design. We need to provide a means for a user to a bus reset in cases where the existing mechanisms are not or not reliable. " (Adam Williamson, 'vfio-pci: PCI hot reset interface' commit 8b27ee60bfd6bbb84d2df28fa706c5c5081066ca). As such to do a slot or a bus reset is we need another mechanism. This is not exposed SysFS as there is no good way of exposing a bus topology there. This is due to the complexity - we MUST know that the different functions off a PCIe device are not in use by other drivers, or if they are in use (say one of them is assigned to a guest and the other is idle) - it is still OK to reset the slot (assuming both of them are owned by Xen pciback). This patch does that by doing an slot or bus reset (if slot not supported) if all of the functions of a PCIe device belong to Xen PCIback. We do not care if the device is in-use as we depend on the toolstack to be aware of this - however if it is we will WARN the user. Due to the complexity with the PCI lock we cannot do the reset when a device is binded ('echo $BDF > bind') or when unbinded ('echo $BDF > unbind') as the pci_[slot|bus]_reset also take the same lock resulting in a dead-lock. Putting the reset function in a workqueue or thread won't work either - as we have to do the reset function outside the 'unbind' context (it holds the PCI lock). But once you 'unbind' a device the device is no longer under the ownership of Xen pciback and the pci_set_drvdata has been reset so we cannot use a thread for this. Instead of doing all this complex dance, we depend on the toolstack doing the right thing. As such implement the 'do_flr' SysFS attribute which 'xl' uses when a device is detached or attached from/to a guest. It bypasses the need to worry about the PCI lock. To not inadvertly do a bus reset that would affect devices that are in use by other drivers (other than Xen pciback) prior to the reset we check that all of the devices under the bridge are owned by Xen pciback. If they are not we do not do the bus (or slot) reset. We also warn the user if the device is in use - but still continue with the reset. This should not happen as the toolstack also does the check. Signed-off-by: Konrad Rzeszutek Wilk Nacked-by: Konrad Rzeszutek Wilk --- Documentation/ABI/testing/sysfs-driver-pciback | 12 +++ drivers/xen/xen-pciback/pci_stub.c | 124 ++++++++++++++++++++++--- 2 files changed, 125 insertions(+), 11 deletions(-) diff --git a/Documentation/ABI/testing/sysfs-driver-pciback b/Documentation/ABI/testing/sysfs-driver-pciback index 6a733bf..2d4e32f 100644 --- a/Documentation/ABI/testing/sysfs-driver-pciback +++ b/Documentation/ABI/testing/sysfs-driver-pciback @@ -11,3 +11,15 @@ Description: #echo 00:19.0-E0:2:FF > /sys/bus/pci/drivers/pciback/quirks will allow the guest to read and write to the configuration register 0x0E. + + +What: /sys/bus/pci/drivers/pciback/do_flr +Date: December 2014 +KernelVersion: 3.19 +Contact: xen-devel@lists.xenproject.org +Description: + An option to slot or bus reset an PCI device owned by + Xen PCI backend. Writing a string of DDDD:BB:DD.F will cause + the driver to perform an slot or bus reset if the device + supports. It also checks to make sure that all of the devices + under the bridge are owned by Xen PCI backend. diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index cc3cbb4..f830bf4 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -100,14 +100,9 @@ static void pcistub_device_release(struct kref *kref) xen_unregister_device_domain_owner(dev); - /* Call the reset function which does not take lock as this - * is called from "unbind" which takes a device_lock mutex. - */ - __pci_reset_function_locked(dev); + /* Reset is done by the toolstack by using 'do_flr' on the SysFS. */ if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); - else - pci_restore_state(dev); if (dev->msix_cap) { struct physdev_pci_device ppdev = { @@ -123,9 +118,6 @@ static void pcistub_device_release(struct kref *kref) err); } - /* Disable the device */ - xen_pcibk_reset_device(dev); - kfree(dev_data); pci_set_drvdata(dev, NULL); @@ -242,6 +234,87 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev, return found_dev; } +struct wrapper_args { + struct pci_dev *dev; + int in_use; +}; + +static int pcistub_pci_walk_wrapper(struct pci_dev *dev, void *data) +{ + struct pcistub_device *psdev, *found_psdev = NULL; + struct wrapper_args *arg = data; + unsigned long flags; + + spin_lock_irqsave(&pcistub_devices_lock, flags); + list_for_each_entry(psdev, &pcistub_devices, dev_list) { + if (psdev->dev == dev) { + found_psdev = psdev; + if (psdev->pdev) + arg->in_use++; + break; + } + } + spin_unlock_irqrestore(&pcistub_devices_lock, flags); + dev_dbg(&dev->dev, "%s\n", found_psdev ? "OK" : "not owned by us!"); + + if (!found_psdev) + arg->dev = dev; + return found_psdev ? 0 : 1; +} + +static int pcistub_reset_pci_dev(struct pci_dev *dev) +{ + struct xen_pcibk_dev_data *dev_data; + struct wrapper_args arg = { .dev = NULL, .in_use = 0 }; + bool slot = false, bus = false; + int rc; + + if (!dev) + return -EINVAL; + + if (!pci_probe_reset_slot(dev->slot)) + slot = true; + else if (!pci_probe_reset_bus(dev->bus)) { + /* We won't attempt to reset a root bridge. */ + if (!pci_is_root_bus(dev->bus)) + bus = true; + } + dev_dbg(&dev->dev, "resetting (FLR, D3, %s %s) the device\n", + slot ? "slot" : "", bus ? "bus" : ""); + + pci_walk_bus(dev->bus, pcistub_pci_walk_wrapper, &arg); + + if (arg.in_use) + dev_err(&dev->dev, "is in use!\n"); + + /* + * Takes the PCI lock. OK to do it as we are never called + * from 'unbind' state and don't deadlock. + */ + dev_data = pci_get_drvdata(dev); + if (!pci_load_saved_state(dev, dev_data->pci_saved_state)) + pci_restore_state(dev); + + pci_reset_function(dev); + + /* This disables the device. */ + xen_pcibk_reset_device(dev); + + /* And cleanup up our emulated fields. */ + xen_pcibk_config_reset_dev(dev); + + if (!bus && !slot) + return 0; + + /* All slots or devices under the bus should be part of pcistub! */ + if (arg.dev) { + dev_err(&dev->dev, "depends on: %s!\n", pci_name(arg.dev)); + return -EBUSY; + } + return slot ? pci_try_reset_slot(dev->slot) : + pci_try_reset_bus(dev->bus); +} + /* * Called when: * - XenBus state has been reconfigure (pci unplug). See xen_pcibk_remove_device @@ -277,8 +350,9 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * pcistub and xen_pcibk when AER is in processing */ down_write(&pcistub_sem); - /* Cleanup our device - * (so it's ready for the next domain) + /* Cleanup our device (so it's ready for the next domain) + * That is the job of the toolstack which has to call 'do_flr' before + * providing the PCI device to a guest (see pcistub_do_flr). */ device_lock_assert(&dev->dev); __pci_reset_function_locked(dev); @@ -1389,6 +1463,29 @@ static ssize_t permissive_show(struct device_driver *drv, char *buf) static DRIVER_ATTR(permissive, S_IRUSR | S_IWUSR, permissive_show, permissive_add); +static ssize_t pcistub_do_flr(struct device_driver *drv, const char *buf, + size_t count) +{ + int domain, bus, slot, func; + int err; + struct pcistub_device *psdev; + + err = str_to_slot(buf, &domain, &bus, &slot, &func); + if (err) + goto out; + + psdev = pcistub_device_find(domain, bus, slot, func); + if (psdev) { + err = pcistub_reset_pci_dev(psdev->dev); + pcistub_device_put(psdev); + } else + err = -ENODEV; +out: + if (!err) + err = count; + return err; +} +static DRIVER_ATTR(do_flr, S_IWUSR, NULL, pcistub_do_flr); static void pcistub_exit(void) { driver_remove_file(&xen_pcibk_pci_driver.driver, &driver_attr_new_slot); @@ -1402,6 +1499,8 @@ static void pcistub_exit(void) &driver_attr_irq_handlers); driver_remove_file(&xen_pcibk_pci_driver.driver, &driver_attr_irq_handler_state); + driver_remove_file(&xen_pcibk_pci_driver.driver, + &driver_attr_do_flr); pci_unregister_driver(&xen_pcibk_pci_driver); } @@ -1495,6 +1594,9 @@ static int __init pcistub_init(void) if (!err) err = driver_create_file(&xen_pcibk_pci_driver.driver, &driver_attr_irq_handler_state); + if (!err) + err = driver_create_file(&xen_pcibk_pci_driver.driver, + &driver_attr_do_flr); if (err) pcistub_exit();