From patchwork Tue Oct 28 21:51:08 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alex Williamson <alex.williamson@redhat.com>
X-Patchwork-Id: 5181541
X-Patchwork-Delegate: bhelgaas@google.com
Return-Path: <linux-pci-owner@kernel.org>
X-Original-To: patchwork-linux-pci@patchwork.kernel.org
Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org
Received: from mail.kernel.org (mail.kernel.org [198.145.19.201])
	by patchwork2.web.kernel.org (Postfix) with ESMTP id 1735AC11AC
	for <patchwork-linux-pci@patchwork.kernel.org>;
	Tue, 28 Oct 2014 21:51:19 +0000 (UTC)
Received: from mail.kernel.org (localhost [127.0.0.1])
	by mail.kernel.org (Postfix) with ESMTP id 3542A20176
	for <patchwork-linux-pci@patchwork.kernel.org>;
	Tue, 28 Oct 2014 21:51:18 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 1F8702015A
	for <patchwork-linux-pci@patchwork.kernel.org>;
	Tue, 28 Oct 2014 21:51:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754902AbaJ1VvQ (ORCPT
	<rfc822;patchwork-linux-pci@patchwork.kernel.org>);
	Tue, 28 Oct 2014 17:51:16 -0400
Received: from mx1.redhat.com ([209.132.183.28]:59055 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751141AbaJ1VvP (ORCPT <rfc822;linux-pci@vger.kernel.org>);
	Tue, 28 Oct 2014 17:51:15 -0400
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s9SLp95Q030091
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=FAIL); Tue, 28 Oct 2014 17:51:09 -0400
Received: from [10.3.113.81] (ovpn-113-81.phx2.redhat.com [10.3.113.81])
	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with
	ESMTP id s9SLp8tm019115; Tue, 28 Oct 2014 17:51:09 -0400
Message-ID: <1414533068.27420.226.camel@ul30vt.home>
Subject: Re: Hard and silent lock up since linux 3.14 with PCIe pass through
	(vfio)
From: Alex Williamson <alex.williamson@redhat.com>
To: Andreas Hartmann <andihartmann@freenet.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>
Date: Tue, 28 Oct 2014 15:51:08 -0600
In-Reply-To: <544B3D14.70907@maya.org>
References: <20140923210318.498dacbd@dualc.maya.org>
	<1411502866.24563.8.camel@ul30vt.home> <5437A958.3000201@maya.org>
	<CAErSpo5kkk4itqqmCMU=08-XMcDknur8VY++F3A2LajdvTptOA@mail.gmail.com>
	<5437F1F5.3010706@maya.org>
	<CAErSpo6DuWEcJb_gfS=5JnKN8b4DXXV9+u-CRa8m3db0dbzpsQ@mail.gmail.com>
	<543804BC.3080307@maya.org>
	<CAErSpo450czUtAM104o+RS=yOcmP+3MKMH7mYk=UYhVJ2+cfSQ@mail.gmail.com>
	<20141011003219.560cca97@dualc.maya.org>
	<20141010225408.GA24493@google.com> <5438CC1E.3060407@maya.org>
	<1413360267.4202.70.camel@ul30vt.home> <54406B34.1050808@maya.org>
	<1413925580.4202.189.camel@ul30vt.home>
	<1413927152.4202.195.camel@ul30vt.home> <5447D9D9.9030909@maya.org>
	<1414010215.4202.275.camel@ul30vt.home> <54492606.5090308@maya.org>
	<1414082022.27420.39.camel@ul30vt.home> <54493BFA.8010609@maya.org>
	<1414093023.27420.40.camel@ul30vt.home> <544B3D14.70907@maya.org>
Mime-Version: 1.0
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.22
Sender: linux-pci-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-pci.vger.kernel.org>
X-Mailing-List: linux-pci@vger.kernel.org
X-Spam-Status: No, score=-7.5 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI,
	RP_MATCHES_RCVD,
	UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Sat, 2014-10-25 at 08:03 +0200, Andreas Hartmann wrote:
> 
> Out of interest:
> Bjorn's patch disables vc save/restore support - and the machine works
> fine again. Why is it needed at all if it seems to work perfectly w/o
> it? What's the additional benefit? Or in other words: What am I missing
> until today :-) ? What would be better? What could I do more?


You're right, in the configuration you have the endpoint device has a
Virtual Channel capability but the upstream root port does not.  The
spec is not at all clear about defining the endpoints for enabling
Virtual Channel in each type of configuration, but I think that if we
have an upstream port that does not support Virtual Channel, we can skip
the save/restore.  Please test the patch below.

I'm also still completely confused about whether this is a VC
save/restore issue or a bus reset issue.  You originally bisected this
back to the VC save/restore patch, but you also found that a manual,
setpci-based bus reset triggered a system hang.  I believe that
re-ordering the kernel reset mechanisms also triggered this.  Since
recent versions of QEMU are going to favor a bus reset over PM reset, I
don't have a lot of confidence that we're actually solving the problem
for you.  Please make sure to test with a recent QEMU to be sure we'll
do a bus reset.  Thanks,

Alex
---
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/pci/vc.c b/drivers/pci/vc.c
index 7e1304d..6d13d34 100644
--- a/drivers/pci/vc.c
+++ b/drivers/pci/vc.c
@@ -339,6 +339,25 @@ static int pci_vc_do_save_buffer(struct pci_dev *dev, int pos,
 	return buf ? 0 : len;
 }
 
+/**
+ * pci_vc_needs_save - Determine whether a VC capability needs to be saved
+ * @dev: device
+ * @id: VC capability ID (VC/VC9/MFVC)
+ *
+ * In configurations where we have a VC or MFVC capability, but the upstream
+ * device does not, we assume that VC save (and therefore restore) is not
+ * necessary.  The intention is to only do VC save/restore in configuration
+ * where it's necessary and hopefully avoid reset issues.
+ */
+static bool pci_vc_needs_save(struct pci_dev *dev, u16 id)
+{
+	if (id == PCI_EXT_CAP_ID_VC9 || pci_is_root_bus(dev->bus) ||
+	    pci_find_ext_capability(dev->bus->self, PCI_EXT_CAP_ID_VC))
+		return true;
+
+	return false;
+}
+
 static struct {
 	u16 id;
 	const char *name;
@@ -362,7 +381,7 @@ int pci_save_vc_state(struct pci_dev *dev)
 		struct pci_cap_saved_state *save_state;
 
 		pos = pci_find_ext_capability(dev, vc_caps[i].id);
-		if (!pos)
+		if (!posi || !pci_vc_needs_save(dev, vc_caps[i].id))
 			continue;
 
 		save_state = pci_find_saved_ext_cap(dev, vc_caps[i].id);
@@ -422,7 +441,7 @@ void pci_allocate_vc_save_buffers(struct pci_dev *dev)
 	for (i = 0; i < ARRAY_SIZE(vc_caps); i++) {
 		int len, pos = pci_find_ext_capability(dev, vc_caps[i].id);
 
-		if (!pos)
+		if (!pos || !pci_vc_needs_save(dev, vc_caps[i].id))
 			continue;
 
 		len = pci_vc_do_save_buffer(dev, pos, NULL, false);