From patchwork Fri Sep 22 02:09:12 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Christopher Clark X-Patchwork-Id: 9965041 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 4650560381 for ; Fri, 22 Sep 2017 02:12:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 26E54297AD for ; Fri, 22 Sep 2017 02:12:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1B117297B0; Fri, 22 Sep 2017 02:12:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED, FREEMAIL_FROM, RCVD_IN_DNSWL_MED, RCVD_IN_SORBS_SPAM, T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 877BC297AD for ; Fri, 22 Sep 2017 02:12:01 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dvDP7-00034s-Jf; Fri, 22 Sep 2017 02:09:21 +0000 Received: from mail6.bemta6.messagelabs.com ([193.109.254.103]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dvDP5-00034m-JC for xen-devel@lists.xenproject.org; Fri, 22 Sep 2017 02:09:19 +0000 Received: from [193.109.254.147] by server-1.bemta-6.messagelabs.com id A2/1D-03414-EC074C95; Fri, 22 Sep 2017 02:09:18 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrJKsWRWlGSWpSXmKPExsVyMfTOat0zBUc iDVb/1Lb4vmUykwOjx+EPV1gCGKNYM/OS8isSWDPuNjxnL5j3nrFi38eJLA2MU58zdjFycQgJ zGSUuH5mE5jDItDAKrGgp5uli5GTQ0JgDqvEyWNxEHaZxPXVs1kh7CKJHd+fsEHYhRInt24Fq +cVEJQ4OfMJmC0k4ClxfuUOdhCbU8Bc4tePDjaIbbsZJU72vmYGSbABJdb37WAEsVkEVCW+dF 9khhgUINEy4SlQMweHsECsxOrGGBBTREBX4tvVApAxzALnGSW+bj7HChJnFnCT2DFFcAKj4Cw kV8xCyICEmQU0JVq3/2aHsLUlli18zQxhZ0n8ObaBFcKOlzg1cSpUjaLElO6HYLaEgIzExYM3 mBYwcqxi1ChOLSpLLdI1NNJLKspMzyjJTczM0TU0MNPLTS0uTkxPzUlMKtZLzs/dxAiMGAYg2 MF4eWPAIUZJDiYlUV7f7CORQnxJ+SmVGYnFGfFFpTmpxYcYZTg4lCR4k/KBcoJFqempFWmZOc DYhUlLcPAoifDKg6R5iwsSc4sz0yFSpxi9OS7cufSHiePYpstA8sCeW0Dy0Y27QHLGTRD55Nq 8v0xCLHn5ealS4rxbQEYIgIzIKM2DWwBLQZcYZaWEeRmBThbiKUgtys0sQZV/xSjOwagkzNsD MoUnM68E7o5XQCcyAZ2YveEAyIkliQgpqQbG4l+GKf6N7kuU/srczt8na9x978idX4JJDDdvL 418PSdoPUeuyuRzGdcuLavzszzDwupwmyspdq6nRXKabh/Xzug0870ffrKs+HpSwGvahtUv2P dkv53LrV1+nCtqabBbfHtVaOXyyPVJQoZx8jUdeqtkU+Yq7n0195HOEpd5E+feOLBW8Yq2Ekt xRqKhFnNRcSIAQN1s/DwDAAA= X-Env-Sender: christopher.w.clark@gmail.com X-Msg-Ref: server-13.tower-27.messagelabs.com!1506046155!108444145!1 X-Originating-IP: [209.85.220.171] X-SpamReason: No, hits=0.8 required=7.0 tests=BODY_RANDOM_LONG, RCVD_BY_IP X-StarScan-Received: X-StarScan-Version: 9.4.45; banners=-,-,- X-VirusChecked: Checked Received: (qmail 31281 invoked from network); 22 Sep 2017 02:09:16 -0000 Received: from mail-qk0-f171.google.com (HELO mail-qk0-f171.google.com) (209.85.220.171) by server-13.tower-27.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 22 Sep 2017 02:09:16 -0000 Received: by mail-qk0-f171.google.com with SMTP id r141so7567622qke.2 for ; Thu, 21 Sep 2017 19:09:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=PYLJ++rEMKkaRPXiMFy8VXYZQKxj383qXCY2XNUiuiM=; b=GaRebYIW1EzBxx8iiQ77jPB+ZIIDYBQTfHbln2mMnGrjXJ3vjM1Dszj4NiP1Lr9JN0 4MkuWsAIAPedyIzd61guUrjKhBN0x1w3U+sx9is3yNQGbEzkzaJ7m19gre2NS6Xx+lLY PKqwhxdXOCWDFsQk+7aaEgyb7PxCHk2Fp9xkqCJMvt5FIZcINLKjO0TJcU4Qx14e9V19 6vZpUpbIgU443hk9iXJv06mKbzIVzldmKf5iD9G38sKCIwhAdEYqEFXaGvK4l3lAM/92 tXDDQuRDnlZvddlor0OhZdsCrAbGfyh6sZCZRUI48SirZOFgYowXXkIxQJSNK1ly2PrH a7Qg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=PYLJ++rEMKkaRPXiMFy8VXYZQKxj383qXCY2XNUiuiM=; b=d+8URBx0i/urHgqsderR6Uw/Wg1r7n7QD5cLDV2mJKqhDWyT8Xxi9dadA3tlIBkQjM SL0zxIOmsZh5SIXdR/mfKI60PGwLGl/Olu6SEtIysb76bt2u7/6bVXibMx6E/9OfMfZ2 kWVV2+eQ3MvO1FjL1fE0uOVZD3HGFURU1holUgydvKovdsUIOueq/wmRJJAiTHDSzx1P D1u5kowlbqdwQKH1+aZMcpxwVcEQPEoQBIxsOThVCuyonZzH0+TUlYio19+AExL8T6Yk oNCodb9iwREIqMZ/4cnPV4b8MXLTtbUEbr3Jj5DGumP9VZJhdy2+2JXe2y4aT7W28WdD 0XXQ== X-Gm-Message-State: AHPjjUhkgICMdG+sXt5lVX0W4NCutNJK4OgK7CggO7vi0IAMNwMESy82 X77cksX8kdmWKuf+GyrL6LuXtu1wKagdFgwfWl8= X-Google-Smtp-Source: AOwi7QCDHndSurLeMdTv0V9mB1UTu8feYlfO89zn/FPGM+rb/ql38T0frohMbpZ7LifYsZJdUcNJM9Ed+QSyxX0OOnI= X-Received: by 10.55.98.18 with SMTP id w18mr6149540qkb.163.1506046155181; Thu, 21 Sep 2017 19:09:15 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.91.6 with HTTP; Thu, 21 Sep 2017 19:09:12 -0700 (PDT) In-Reply-To: <1707661043.20170921222701@eikelenboom.it> References: <609868325.225353.1505937035977.JavaMail.zimbra@savoirfairelinux.com> <20170921083952.r2moey2n7qgnowgo@dhcp-3-128.uk.xensource.com> <1707661043.20170921222701@eikelenboom.it> From: Christopher Clark Date: Thu, 21 Sep 2017 19:09:12 -0700 Message-ID: To: Sander Eikelenboom Cc: xen-devel@lists.xenproject.org, =?UTF-8?Q?Roger_Pau_Monn=C3=A9?= , Kyle Temkin , =?UTF-8?B?SsOpcsO0bWUgT3VmZWxsYQ==?= Subject: Re: [Xen-devel] pci-passthrough loses msi-x interrupts ability after domain destroy X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Thu, Sep 21, 2017 at 1:27 PM, Sander Eikelenboom wrote: > > On Thu, September 21, 2017, 10:39:52 AM, Roger Pau Monné wrote: > >> On Wed, Sep 20, 2017 at 03:50:35PM -0400, Jérôme Oufella wrote: >>> >>> I'm using PCI pass-through to map a PCIe (intel i210) controller into >>> a HVM domain. The system uses xen-pciback to hide the appropriate PCI >>> device from Dom0. >>> >>> When creating the HVM domain after an hypervisor cold boot, the HVM >>> domain can access and use the PCIe controller without problem. >>> >>> However, if the HVM domain is destroyed then restarted, it won't be >>> able to use the pass-through PCI device anymore. The PCI device is >>> seen and can be mapped, however, the interrupts will not be passed to >>> the HVM domain anymore (this is visible under a Linux guest as >>> /proc/interrupts counters remain 0). The behavior on a Windows10 guest >>> is the same. >>> >>> A few interesting hints I noticed: >>> >>> - On Dom0, 'lspci -vv' on that PCIe device between the "working" and >>> the "muted interrupts" states, I noted a difference between the >>> MSI-X caps: >>> >>> - Capabilities: [70] MSI-X: Enable- Count=5 Masked- <-- IRQs will work if domain started >>> + Capabilities: [70] MSI-X: Enable- Count=5 Masked+ <-- IRQs won't work if domain started >>> ^^^^^^^ > >> IMHO it seems that either your device is not able to perform a reset >> successfully, or Linux is not correctly performing such reset. I don't >> think there's a lot that can be done from the Xen side. > > Unfortunately for a lot of pci-devices a simple reset as performed by default isn't enough, > but also almost none support a real pci FLR. > > In the distant past Konrad has made a patchset that implemented a bus reset and > reseting config space. (It piggy backed on already existing libxl mechanism of > trying to call on a syfs "do_flr" attribute which triggers pciback to perform > the busreset and rewrite of config space for the device. > > I use that patchset ever since for my pci-passtrough needs and it works pretty > well. I can shutdown an restart VM's with pci devices passed trhough (also AMD > Radeon graphic cards). Just to confirm the utility of that piece of work: OpenXT also uses an extended version of that same patch to perform device reset for passthrough. I've attached a copy of that OpenXT patch to this message and it can also be obtained from our git repository: https://github.com/OpenXT/xenclient-oe/blob/f8d3b282a87231d9ae717b13d506e8e7e28c78c4/recipes-kernel/linux/4.9/patches/thorough-reset-interface-to-pciback-s-sysfs.patch This version creates a sysfs node named "reset_device" and the OpenXT libxl toolstack is patched to use that node instead of "do_flr". Konrad's original work encountered pushback on upstream acceptance at the time it was developed. I'm not sure I've found where that discussion ended. Is there any prospect of a more comprehensive reset mechanism being accepted into xen-pciback, or elsewhere in the kernel? As noted in the original LKML threads, vfio has similar relevant pci device reset retry logic. (Thanks to Rich Persaud for this pointer:) http://elixir.free-electrons.com/linux/v4.14-rc1/source/drivers/vfio/pci/vfio_pci.c#L1353 libvirt also performs similar reset logic, using a direct low level interface to config space (Thanks to Marek for this pointer, libvirt is used by Qubes:) https://github.com/libvirt/libvirt/blob/master/src/util/virpci.c#L929 I thinks this indicates that it would be possible to extend libxl to do something similar, but that seems less satisfactory compared to performing the work in a kernel-provided implementation. Is there a way forward to providing this functionality within Xen software or Linux? Christopher --- openxt.org From d686351d8ea4a1ea1d755d0a10f6f14d1c870911 Mon Sep 17 00:00:00 2001 From: Kyle Temkin Date: Wed, 8 Apr 2015 00:58:24 -0400 Subject: [PATCH] Add thorough reset interface to pciback's sysfs. -------------------------------------------------------------------------------- SHORT DESCRIPTION: -------------------------------------------------------------------------------- Adds an interface that allows "more thorough" resets to be performed on devices which don't support Function Level Resets (FLRs). This interface should allow the toolstack to ensure that a PCI device is in a known state prior to passing it through to a VM. -------------------------------------------------------------------------------- LONG DESCRIPTION: -------------------------------------------------------------------------------- From Konrad Rzeszutek Wilk's original post to xen-devel and the LKML: The life-cycle of a PCI device in Xen pciback is complex and is constrained by the PCI generic locking mechanism. It starts with the device being binded to us - for which we do a device function reset (and done via SysFS so the PCI lock is held) If the device is unbinded from us - we also do a function reset (also done via SysFS so the PCI lock is held). If the device is un-assigned from a guest - we do a function reset (no PCI lock). All on the individual PCI function level (so bus:device:function). Unfortunatly a function reset is not adequate for certain PCIe devices. The reset for an individual PCI function "means device must support FLR (PCIe or AF), PM reset on D3hot->D0 device specific reset, or be a singleton device on a bus a secondary bus reset. FLR does not have widespread support, reset is not very reliable, and bus topology is dictated by the and device design. We need to provide a means for a user to a bus reset in cases where the existing mechanisms are not or not reliable. " (Adam Williamson, 'vfio-pci: PCI hot reset interface' commit 8b27ee60bfd6bbb84d2df28fa706c5c5081066ca). As such to do a slot or a bus reset is we need another mechanism. This is not exposed SysFS as there is no good way of exposing a bus topology there. This is due to the complexity - we MUST know that the different functions off a PCIe device are not in use by other drivers, or if they are in use (say one of them is assigned to a guest and the other is idle) - it is still OK to reset the slot (assuming both of them are owned by Xen pciback). This patch does that by doing an slot or bus reset (if slot not supported) if all of the functions of a PCIe device belong to Xen PCIback. We do not care if the device is in-use as we depend on the toolstack to be aware of this - however if it is we will WARN the user. Due to the complexity with the PCI lock we cannot do the reset when a device is binded ('echo $BDF > bind') or when unbinded ('echo $BDF > unbind') as the pci_[slot|bus]_reset also take the same lock resulting in a dead-lock. Putting the reset function in a workqueue or thread won't work either - as we have to do the reset function outside the 'unbind' context (it holds the PCI lock). But once you 'unbind' a device the device is no longer under the ownership of Xen pciback and the pci_set_drvdata has been reset so we cannot use a thread for this. Instead of doing all this complex dance, we depend on the toolstack doing the right thing. As such implement [... a SysFS attribute] which [... the toolstack] uses when a device is detached or attached from/to a guest. It bypasses the need to worry about the PCI lock. To not inadvertly do a bus reset that would affect devices that are in use by other drivers (other than Xen pciback) prior to the reset we check that all of the devices under the bridge are owned by Xen pciback. If they are not we do not do the bus (or slot) reset. We also warn the user if the device is in use - but still continue with the reset. This should not happen as the toolstack also does the check. -- Our version of the patch has been modified to use a less confusing sysfs name. The original name ('do_flr') is inappropriate, as it implies a function level reset; it is entirely possible that the patch code will use a bus-level reset when appropriate. The new sysfs entry is located at: /sys/bus/pci/drivers/pciback/reset_device and can be activated by writing a domain:bus:device:function device identifier into the sysfs file. As an example: echo "0000:01:00.0" > /sys/bus/pci/drivers/pciback/reset_device would reset the device matching the D:BDF descriptor above. -------------------------------------------------------------------------------- CHANGELOG: -------------------------------------------------------------------------------- This is a port of a patch that likely had many authors, including: -Konrad Rzeszutek Wilk -Alex Williamson -Ross Phillipson Ported to OpenXT by: Kyle J. Temkin , 4/8/15 Rewrite by: Kyle J. Temkin , 4/10/15 -------------------------------------------------------------------------------- DEPENDENCIES -------------------------------------------------------------------------------- This patch requires ONE of the following: -A relatively modern linux kernel (3.18+) as a base; which provides the PCI functions used; or -Our PCI reset backports patch (backport-pci-reset-functionality.patch), which backports the relevant functionality to 3.11. To take advantage of this patch, the utilized toolstack should be changed to: -Use the provided "reset_device" property, rather than the PCI device's sysfs "reset" entry. This enables resets beyond a FLR to be used. -Ensure that all functions of a given device are passed through together. This allows us to use some of the more thorugh resetting techniques, when possible. -------------------------------------------------------------------------------- REMOVAL -------------------------------------------------------------------------------- This patch provides a service which is necessary for proper passthrough of many PCI cards: a generalized ability to reset PCI devices, without requiring that the device support FLR or power-management based resets. This patch will be necessary until either the Linux PCI subsystem or Xen PCIback drivers are modified to provide this support; or until cards without proper FLR support are no longer supported. -------------------------------------------------------------------------------- UPSTREAM PLAN -------------------------------------------------------------------------------- This code is taken from a patch which was originally proposed and rejected from upstream on the LKML and xen-devel. An upstream implementation of the functionality of this patch is still necessary; and can and should be implemented. This patch will hopefully be replaced with an upstream version when community concensus has produced a single "blessed" method of accomplishing its functionality. -------------------------------------------------------------------------------- PATCHES -------------------------------------------------------------------------------- --- drivers/xen/xen-pciback/pci_stub.c | 338 ++++++++++++++++++++++++++++++++++--- 1 file changed, 312 insertions(+), 26 deletions(-) Index: linux-4.9.40/drivers/xen/xen-pciback/pci_stub.c =================================================================== --- linux-4.9.40.orig/drivers/xen/xen-pciback/pci_stub.c +++ linux-4.9.40/drivers/xen/xen-pciback/pci_stub.c @@ -102,10 +102,9 @@ static void pcistub_device_release(struc xen_unregister_device_domain_owner(dev); - /* Call the reset function which does not take lock as this - * is called from "unbind" which takes a device_lock mutex. - */ - __pci_reset_function_locked(dev); + + /* Reset is done by the toolstack by using 'reset_device' on the + * SysFS. */ if (pci_load_and_free_saved_state(dev, &dev_data->pci_saved_state)) dev_info(&dev->dev, "Could not reload PCI state\n"); else @@ -125,9 +124,6 @@ static void pcistub_device_release(struc err); } - /* Disable the device */ - xen_pcibk_reset_device(dev); - kfree(dev_data); pci_set_drvdata(dev, NULL); @@ -224,6 +220,271 @@ struct pci_dev *pcistub_get_pci_dev_by_s return found_dev; } + +/** + * Returns true iff the given device supports PCIe FLRs. + */ +static bool __device_supports_pcie_flr(struct pci_dev *dev) +{ + u32 cap; + + /* + * Read the device's capabilities. Note that this can be used even on legacy + * PCI devices (and not just on PCIe devices)-- it indicates that no capabilities + * are supported if the device is legacy PCI by setting cap to 0. + */ + pcie_capability_read_dword(dev, PCI_EXP_DEVCAP, &cap); + + /* Return true iff the device advertises supporting an FLR. */ + return (cap & PCI_EXP_DEVCAP_FLR); +} + + +/** + * Returns true iff the given device supports PCI Advanced Functionality (AF) FLRs. + */ +static bool __device_supports_pci_af_flr(struct pci_dev *dev) +{ + int pos; + u8 capability_flags; + + /* First, try to find the location of the PCI Advanced Functionality capability byte. */ + pos = pci_find_capability(dev, PCI_CAP_ID_AF); + + /* + * If we weren't able to find the capability byte, this device doesn't support + * the Advanced Functionality extensions, and thus won't support AF FLR. + */ + if (!pos) + return false; + + /* Read the capabilities advertised in the AF capability byte. */ + pci_read_config_byte(dev, pos + PCI_AF_CAP, &capability_flags); + + /* + * If the device does support AF, it will advertise FLR support via the + * PCI_AF_CAP_FLR bit. We'll also check for the Transactions Pending (TP) + * mechanism, as the kernel requires this extension to issue an AF FLR. + * (Internally, the PCI reset code needs to be able to wait for all + * pending transactions to complete prior to issuing the AF FLR.) + */ + return (capability_flags & PCI_AF_CAP_TP) && (capability_flags & PCI_AF_CAP_FLR); +} + + +/** + * Returns true iff the given device adverstises supporting function- + * level-reset (FLR). + */ +static bool device_supports_flr(struct pci_dev *dev) +{ + return __device_supports_pci_af_flr(dev) || __device_supports_pcie_flr(dev); +} + + +/** + * Returns true iff the given device is located in a slot that + * supports hotplugging slot resets. + */ +static bool device_supports_slot_reset(struct pci_dev *dev) +{ + return !pci_probe_reset_slot(dev->slot); +} + + +/** + * Returns true iff the given device is located on a bus that + * we can reset. Note that root bridges are excluded, as this + * would cause more than just an SBR. + */ +static bool device_supports_bus_reset(struct pci_dev *dev) +{ + return !pci_is_root_bus(dev->bus) && !pci_probe_reset_bus(dev->bus); +} + + +/** + * Out argument for the __safe_to_sbr_device_callback function. + */ +struct safe_to_sbr_arguments { + + //Stores the most recently encountered PCI device that does + //not belong to pciback. As used below, this is the result of a + //search for a non-pciback device on a bus; we stop upon finding + //the first non-pciback device. + struct pci_dev *last_non_pciback_device; + + //Stores the number of pciback devices that appear to be in use + //on the bus in question. + int use_count; + +}; + + +/** + * A callback function which determines if a given PCI device is owned by pciback, + * and whether the given device is in use. Used by safe_to_sbr_device. + * + * @param dev The PCI device to be checked. + * @param data An out argument of type struct safe_to_sbr_device_callback_arguments. + * Updated to indicate the result of the search. See the struct's definition + * for more details. + * + */ +static int __safe_to_sbr_device_callback(struct pci_dev *dev, void *data) +{ + + struct pcistub_device *psdev; + + bool device_owned_by_pciback = false; + struct safe_to_sbr_arguments *arg = data; + + unsigned long flags; + + //Ensure that we have exclusive access to the list of PCI devices, + //so we can traverse it. + spin_lock_irqsave(&pcistub_devices_lock, flags); + + //Iterate over all PCI devices owned by the pci stub. + list_for_each_entry(psdev, &pcistub_devices, dev_list) { + + //If the given device is owned by pciback... + if (psdev->dev == dev) { + + //mark it as a pciback device. + device_owned_by_pciback = true; + + //If we have a physical device associated with the pciback device, + //mark this device as in-use. + if (psdev->pdev) + arg->use_count++; + + //Stop searching; we've found a the PCIback device associated with this one. + break; + } + } + + //Release the PCI device lock... + spin_unlock_irqrestore(&pcistub_devices_lock, flags); + + //... and report if we've found a device that's not owned by pciback. + dev_dbg(&dev->dev, "%s\n", device_owned_by_pciback ? "is owned by pciback, and can be reset if not in use." + : "not owned by pciback, and thus cannot be reset."); + + //If we've found a device that's not owned by pciback, update our data + //argument so it points to the most recent unowned device. (We check + //this like a flag, later: if it's never set, no one owns the device!) + if (!device_owned_by_pciback) + arg->last_non_pciback_device = dev; + + //If we've found a device that's not owned by pciback, return false-- + //this indicates that pci_walk_bus should cease its walk. + return !device_owned_by_pciback; +} + + +/** + * Returns true iff it should be safe to issue a secondary bus reset + * to the device; that is, if an SBR can be issued without disrupting + * other devices. + */ +static bool safe_to_sbr_device(struct pci_dev *dev) +{ + struct safe_to_sbr_arguments walk_result = { .last_non_pciback_device = NULL, .use_count = 0 }; + + //Walk the PCI bus, attempting to find if any of the given devices + pci_walk_bus(dev->bus, __safe_to_sbr_device_callback, &walk_result); + + //If the device is in use, emit a warning error. + if(walk_result.use_count > 0) + dev_dbg(&dev->dev, "is in use; currently not safe to SBR device.\n"); + + //Return true iff we did not pick up any other devices + //that were either in use, or not owned by pciback. + return (walk_result.last_non_pciback_device == NULL) && (walk_result.use_count == 0); +} + + +/** + * Attempt a raw reset of the provided PCI device-- via any + * method available to us. This method prefers the gentlest + * possible reset method-- currently an FLR, which many + * PCIe devices should support. + * + * @param dev The pci device to be reset. + * @return Zero on success, or the error code generated by the reset method on failure. + */ +static int __pcistub_raw_device_reset(struct pci_dev *dev) +{ + //Determine if bus resetting techniques (SBR, slot resets) + //are safe, and thus should be allowed. + int allow_bus_reset = safe_to_sbr_device(dev); + + //If FLRs are supported; we'll try to let the linux kernel + //manually reset the device. + if(device_supports_flr(dev)) { + dev_dbg(&dev->dev, "Resetting device using an FLR."); + return pci_reset_function(dev); + } + + //Next, we'll try the next gentlest: a hotplugging reset + //of the PCI slot. + if(allow_bus_reset && device_supports_slot_reset(dev)) { + dev_dbg(&dev->dev, "Resetting device using a slot reset."); + return pci_try_reset_slot(dev->slot); + } + + //Finally, we'll try the most drastic: resetting the parent + //PCI bus-- which we can only do conditionally. + if(allow_bus_reset && device_supports_bus_reset(dev)) { + dev_dbg(&dev->dev, "Resetting device using an SBR."); + return pci_try_reset_bus(dev->bus); + } + + //If we weren't able to reset the device by any of our known-good methods, + //fall back to the linux kernel's reset function. Unfortunately, this considers a + //power management reset to be a valid reset; though this doesn't work for many devices-- + //especially GPUs. + dev_err(&dev->dev, "No reset methods available for %s. Falling back to kernel reset.", pci_name(dev)); + pci_reset_function(dev); + + //Return an error code, indicating that we likely did not reset the device correctly. + return -ENOTTY; +} + + +/** + * Resets the target (pciback-owned) PCI device. Primarily intended + * for use by the toolstack, so it can ensure a consistent PCI device + * state on VM startup. + * + * @param dev The device to be reset. + * @return Zero on success, or a negated error code on failure. + */ +static int pcistub_reset_pci_dev(struct pci_dev *dev) +{ + int rc; + + if (!dev) + return -EINVAL; + + /* + * Takes the PCI lock. OK to do it as we are never called + * from 'unbind' state and don't deadlock. + */ + rc =__pcistub_raw_device_reset(dev); + pci_restore_state(dev); + + /* This disables the device. */ + xen_pcibk_reset_device(dev); + + /* And cleanup up our emulated fields. */ + xen_pcibk_config_reset_dev(dev); + return rc; +} + + + struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev, struct pci_dev *dev) { @@ -279,11 +540,13 @@ void pcistub_put_pci_dev(struct pci_dev * pcistub and xen_pcibk when AER is in processing */ down_write(&pcistub_sem); - /* Cleanup our device - * (so it's ready for the next domain) - */ device_lock_assert(&dev->dev); - __pci_reset_function_locked(dev); + /* + * Reset is up to the toolstack. + * The toolstack has to call 'reset_device' before + * providing the PCI device to a guest (see pcistub_reset_device). + */ + //__pci_reset_function_locked(dev); dev_data = pci_get_drvdata(dev); ret = pci_load_saved_state(dev, dev_data->pci_saved_state); @@ -1460,6 +1723,41 @@ static ssize_t restrictive_add(struct de } static DRIVER_ATTR(restrictive, S_IWUSR, NULL, restrictive_add); +/** + * Handles the "reset_device" sysfs attribute. This is the primary reset interface + * utilized by the toolstack. + */ +static ssize_t pcistub_sysfs_reset_device(struct device_driver *drv, const char *buf, size_t count) +{ + int domain, bus, slot, func, err; + struct pcistub_device *psdev; + + //Attempt to convert the user's string to a BDF/slot. + err = str_to_slot(buf, &domain, &bus, &slot, &func); + if (err) + return -ENODEV; + + //... and then use that slot to find the pciback device. + psdev = pcistub_device_find(domain, bus, slot, func); + + //If we have a device, attempt to reset it using our internal reset path. + if (psdev) { + err = pcistub_reset_pci_dev(psdev->dev); + pcistub_device_put(psdev); + + //If we were not able to reset the device, return the relevant error code. + if(err) + err = -ENODEV; + } + //Otherwise, indicate that there's no such device. + else { + err = -ENODEV; + } + + return err ? err : count; + +} +static DRIVER_ATTR(reset_device, S_IWUSR, NULL, pcistub_sysfs_reset_device); static void pcistub_exit(void) { @@ -1476,6 +1774,8 @@ static void pcistub_exit(void) &driver_attr_irq_handlers); driver_remove_file(&xen_pcibk_pci_driver.driver, &driver_attr_irq_handler_state); + driver_remove_file(&xen_pcibk_pci_driver.driver, + &driver_attr_reset_device); pci_unregister_driver(&xen_pcibk_pci_driver); } @@ -1572,6 +1872,9 @@ static int __init pcistub_init(void) if (!err) err = driver_create_file(&xen_pcibk_pci_driver.driver, &driver_attr_irq_handler_state); + if (!err) + err = driver_create_file(&xen_pcibk_pci_driver.driver, + &driver_attr_reset_device); if (err) pcistub_exit();