From patchwork Wed May  2 17:52:42 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Alan Stern <stern@rowland.harvard.edu>
X-Patchwork-Id: 10376215
Return-Path: <linux-usb-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	19AAA6037D for <patchwork-linux-usb@patchwork.kernel.org>;
	Wed,  2 May 2018 17:52:50 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 098FC260CD
	for <patchwork-linux-usb@patchwork.kernel.org>;
	Wed,  2 May 2018 17:52:50 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id F1FFD283C8; Wed,  2 May 2018 17:52:49 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F0CD260CD
	for <patchwork-linux-usb@patchwork.kernel.org>;
	Wed,  2 May 2018 17:52:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751054AbeEBRwp (ORCPT
	<rfc822;patchwork-linux-usb@patchwork.kernel.org>);
	Wed, 2 May 2018 13:52:45 -0400
Received: from iolanthe.rowland.org ([192.131.102.54]:40622 "HELO
	iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with SMTP id S1750947AbeEBRwn (ORCPT
	<rfc822;linux-usb@vger.kernel.org>); Wed, 2 May 2018 13:52:43 -0400
Received: (qmail 5079 invoked by uid 2102); 2 May 2018 13:52:42 -0400
Received: from localhost (sendmail-bs@127.0.0.1)
	by localhost with SMTP; 2 May 2018 13:52:42 -0400
Date: Wed, 2 May 2018 13:52:42 -0400 (EDT)
From: Alan Stern <stern@rowland.harvard.edu>
X-X-Sender: stern@iolanthe.rowland.org
To: Mathias Nyman <mathias.nyman@linux.intel.com>
cc: russianneuromancer@ya.ru, <linux-usb@vger.kernel.org>
Subject: Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host
	controller not responding, assume dead" on Dell 5855
In-Reply-To: <62a818ed-2e9b-a5bf-abd1-22a6d4287e65@linux.intel.com>
Message-ID: <Pine.LNX.4.44L0.1805021334100.1505-100000@iolanthe.rowland.org>
MIME-Version: 1.0
Sender: linux-usb-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-usb.vger.kernel.org>
X-Mailing-List: linux-usb@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

On Wed, 2 May 2018, Mathias Nyman wrote:

> On 24.04.2018 16:50, Alan Stern wrote:
> > On Tue, 24 Apr 2018, Mathias Nyman wrote:
> > 
> >>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
> >>>>> hcd-pci.c:suspend_common() should prevent the controller from going
> >>>>> back into D3.  The WAKEUP_PENDING bit gets set in
> >>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
> >>>>> hcd_bus_resume() runs.
> >>>>>
> >>>>
> >>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this
> >>>> specific failing case
> >>>>
> >>>> xhci_resume() has a check:
> >>>> /* Resume root hubs only when have pending events. */
> >>>>      status = readl(&xhci->op_regs->status);
> >>>>        if (status & STS_EINT) {
> >>>>          usb_hcd_resume_root_hub(xhci->shared_hcd);
> >>>>          usb_hcd_resume_root_hub(hcd);
> >>>>        }
> >>>>
> >>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
> >>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt
> >>>> the controller may be in D3 already
> >>>>
> >>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt,
> >>>> could be possible as xhci has interrupt moderation enabled.
> >>>
> >>> Then maybe that test should be removed.  Calling
> >>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
> >>> because there probably are not very many times when the controller gets
> >>> resumed without the root hub also being resumed.
> >>>
> >>
> >> The check was added to fix system suspend issue on a runtime suspended host:
> >>
> >> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
> >>
> >>       xhci: Fix runtime suspended xhci from blocking system suspend.
> >>       
> >>       The system suspend flow as following:
> >>       1, Freeze all user processes and kenrel threads.
> >>       
> >>       2, Try to suspend all devices.
> >>       
> >>       2.1, If pci device is in RPM suspended state, then pci driver will try
> >>       to resume it to RPM active state in the prepare stage.
> >>       
> >>       2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
> >>       workqueue items to resume usb2&usb3 roothub devices.
> >>       
> >>       2.3, Call suspend callbacks of devices.
> >>       
> >>       2.3.1, All suspend callbacks of all hcd's children, including
> >>       roothub devices are called.
> >>       
> >>       2.3.2, Finally, hcd_pci_suspend callback is called.
> >>       
> >>       Due to workqueue threads were already frozen in step 1, the workqueue
> >>       items can't be scheduled, and the roothub devices can't be resumed in
> >>       this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
> >>       usb_hcd_resume_root_hub won't be cleared. Finally,
> >>       hcd_pci_suspend will return -EBUSY, and system suspend fails.
> > 
> > Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
> > then, I haven't tested it very much recently.
> > 
> > We could change to a different work queue, one that doesn't get
> > frozen.  But there's no guarantee that the work items would run before
> > your step 2.3.2.
> > 
> > Maybe we can avoid step 2.1.  I think there have been some recent
> > changes to the PM code in this area.  There may be a flag you can set
> > that will prevent the PCI core from resuming the host controller.
> > 
> > Or maybe we can change step 2.3.1, so that the root hub's suspend
> > callback will first do a resume if the WAKEUP_PENDING flag is set.
> > That might be the most reliable approach.
> > 
> 
> I'm not sure I understand the last suggestion, could you open up how it
> would work?

Here's what I had in mind.  See if you think this would work.

Consider choose_wakeup() in core/driver.c.  That subroutine gets called
by usb_suspend() when step 2.3.1 wants to suspend a USB device.  We
could patch it as follows:


> I started approaching this from another direction, mainly making sure we don't
> immediately runtime suspend the host controller after resume. Adding a next_statechange
> minimal time between resume and suspend, and checking for pending events in xhci_suspend().
> 
> I'll have some patches to test for russianneuromancer@ya.ru soon
> 
> These are generic checks that ehci_suspend() also does

To tell the truth, I'm not sure how necessary those next_statechange 
tests really are.

Alan Stern
---
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--- usb-4.x.orig/drivers/usb/core/driver.c
+++ usb-4.x/drivers/usb/core/driver.c
@@ -1449,11 +1449,21 @@ static void choose_wakeup(struct usb_dev
 	 */
 	w = device_may_wakeup(&udev->dev);
 
-	/* If the device is autosuspended with the wrong wakeup setting,
+	/*
+	 * If the device is autosuspended with the wrong wakeup setting,
 	 * autoresume now so the setting can be changed.
+	 *
+	 * Likewise, if the device is an autosuspended root hub and the
+	 * hcd needs to wake it up before the controller can be suspended,
+	 * resume it now to clear the WAKEUP_PENDING flag.
 	 */
-	if (udev->state == USB_STATE_SUSPENDED && w != udev->do_remote_wakeup)
-		pm_runtime_resume(&udev->dev);
+	if (udev->state == USB_STATE_SUSPENDED) {
+		struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
+
+		if (w != udev->do_remote_wakeup ||
+				(!udev->parent && HCD_WAKEUP_PENDING(hcd)))
+			pm_runtime_resume(&udev->dev);
+	}
 	udev->do_remote_wakeup = w;
 }