From patchwork Tue Oct 21 21:32:32 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 5128371 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0E20DC11AC for ; Tue, 21 Oct 2014 21:32:43 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 15C90201B4 for ; Tue, 21 Oct 2014 21:32:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CA783200DE for ; Tue, 21 Oct 2014 21:32:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933723AbaJUVcj (ORCPT ); Tue, 21 Oct 2014 17:32:39 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32617 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933458AbaJUVci (ORCPT ); Tue, 21 Oct 2014 17:32:38 -0400 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s9LLWWBX004482 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Tue, 21 Oct 2014 17:32:32 -0400 Received: from [10.3.113.199] (ovpn-113-199.phx2.redhat.com [10.3.113.199]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id s9LLWWho003830; Tue, 21 Oct 2014 17:32:32 -0400 Message-ID: <1413927152.4202.195.camel@ul30vt.home> Subject: Re: Hard and silent lock up since linux 3.14 with PCIe pass through (vfio) From: Alex Williamson To: Andreas Hartmann Cc: Bjorn Helgaas , linux-pci Date: Tue, 21 Oct 2014 15:32:32 -0600 In-Reply-To: <1413925580.4202.189.camel@ul30vt.home> References: <20140923210318.498dacbd@dualc.maya.org> <1411502866.24563.8.camel@ul30vt.home> <5437A958.3000201@maya.org> <5437F1F5.3010706@maya.org> <543804BC.3080307@maya.org> <20141011003219.560cca97@dualc.maya.org> <20141010225408.GA24493@google.com> <5438CC1E.3060407@maya.org> <1413360267.4202.70.camel@ul30vt.home> <54406B34.1050808@maya.org> <1413925580.4202.189.camel@ul30vt.home> Mime-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 2014-10-21 at 15:06 -0600, Alex Williamson wrote: > Hi Andreas, > > On Fri, 2014-10-17 at 03:04 +0200, Andreas Hartmann wrote: > > Hello Alex, > > > > Alex Williamson wrote: > > > Hi Andreas, > > [...] > > > Sorry for the breakage. Is it possible to run lspci on the device in a > > > loop from the host and capture whether we're failing to restore some of > > > the VC bits to their previous state? > > > > > Does the problem also occur if you > > > unbind from host driver, > > > > The machine is booted w/ blacklisted ath9k. Then, the device is bound to > > vfio: > > > > echo "168c 0030" > /sys/bus/pci/drivers/vfio-pci/new_id > > echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind > > echo 0000:03:00.0 > /sys/bus/pci/drivers/vfio-pci/bind > > > > afterwards the VM is started -> hang. > > > > W/o starting th VM, I can bind it to vfio and unbind it from vfio w/o > > any problem. > > > > > echo 1 > reset in pci-sysfs, > > > > echo 1 > /sys/bus/pci/devices/0000:03:00.0 works w/o any problem while > > bound to vfio. Even after unbinding from vfio and rebinding to vfio > > again ... . > > > > > and re-bind to the > > > > Do you mean loading ath9k in host system after unbinding from vfio? If > > yes: Works w/o any problem. It's even possible to reset it or do a > > ifconfig wlan0 up, ifconfig wlan0 down, rmmod ath9k, bind it to vfio > > again and reset it, .... > > > > Looks like the hang only is triggered by qemu-system_x86_64 on startup > > the VM. Also, this might be because QEMU since 1.7 will favor doing a bus reset for a device over PM reset while the sysfs reset interface will only do a bus reset if there are no other methods available and there are no other devices on the bus. Can you reproduce the hang using the sysfs reset interface without QEMU if you modify the kernel like this: > > > host? I'll also try to reproduce on my 990fx system, but I won't be > > > able to do that until next week due to travel. Thanks, > > Could you send me the lspci -vvvxxxx for the device and parent root > port? Thanks, > > Alex > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html --- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3308,15 +3308,15 @@ static int __pci_dev_reset(struct pci_dev *dev, int prob if (rc != -ENOTTY) goto done; - rc = pci_pm_reset(dev, probe); + rc = pci_dev_reset_slot_function(dev, probe); if (rc != -ENOTTY) goto done; - rc = pci_dev_reset_slot_function(dev, probe); + rc = pci_parent_bus_reset(dev, probe); if (rc != -ENOTTY) goto done; - rc = pci_parent_bus_reset(dev, probe); + rc = pci_pm_reset(dev, probe); done: return rc; }