From patchwork Wed May 2 17:52:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alan Stern X-Patchwork-Id: 10376215 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 19AAA6037D for ; Wed, 2 May 2018 17:52:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 098FC260CD for ; Wed, 2 May 2018 17:52:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F1FFD283C8; Wed, 2 May 2018 17:52:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00, MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F0CD260CD for ; Wed, 2 May 2018 17:52:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751054AbeEBRwp (ORCPT ); Wed, 2 May 2018 13:52:45 -0400 Received: from iolanthe.rowland.org ([192.131.102.54]:40622 "HELO iolanthe.rowland.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1750947AbeEBRwn (ORCPT ); Wed, 2 May 2018 13:52:43 -0400 Received: (qmail 5079 invoked by uid 2102); 2 May 2018 13:52:42 -0400 Received: from localhost (sendmail-bs@127.0.0.1) by localhost with SMTP; 2 May 2018 13:52:42 -0400 Date: Wed, 2 May 2018 13:52:42 -0400 (EDT) From: Alan Stern X-X-Sender: stern@iolanthe.rowland.org To: Mathias Nyman cc: russianneuromancer@ya.ru, Subject: Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855 In-Reply-To: <62a818ed-2e9b-a5bf-abd1-22a6d4287e65@linux.intel.com> Message-ID: MIME-Version: 1.0 Sender: linux-usb-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-usb@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Wed, 2 May 2018, Mathias Nyman wrote: > On 24.04.2018 16:50, Alan Stern wrote: > > On Tue, 24 Apr 2018, Mathias Nyman wrote: > > > >>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in > >>>>> hcd-pci.c:suspend_common() should prevent the controller from going > >>>>> back into D3. The WAKEUP_PENDING bit gets set in > >>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until > >>>>> hcd_bus_resume() runs. > >>>>> > >>>> > >>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in this > >>>> specific failing case > >>>> > >>>> xhci_resume() has a check: > >>>> /* Resume root hubs only when have pending events. */ > >>>> status = readl(&xhci->op_regs->status); > >>>> if (status & STS_EINT) { > >>>> usb_hcd_resume_root_hub(xhci->shared_hcd); > >>>> usb_hcd_resume_root_hub(hcd); > >>>> } > >>>> > >>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM > >>>> can suspend host controller again. when xhci driver finally gets to handle the interrupt > >>>> the controller may be in D3 already > >>>> > >>>> This should only happen if xhci_resume() is called before xhci driver sees a pending interrupt, > >>>> could be possible as xhci has interrupt moderation enabled. > >>> > >>> Then maybe that test should be removed. Calling > >>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad, > >>> because there probably are not very many times when the controller gets > >>> resumed without the root hub also being resumed. > >>> > >> > >> The check was added to fix system suspend issue on a runtime suspended host: > >> > >> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc > >> > >> xhci: Fix runtime suspended xhci from blocking system suspend. > >> > >> The system suspend flow as following: > >> 1, Freeze all user processes and kenrel threads. > >> > >> 2, Try to suspend all devices. > >> > >> 2.1, If pci device is in RPM suspended state, then pci driver will try > >> to resume it to RPM active state in the prepare stage. > >> > >> 2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two > >> workqueue items to resume usb2&usb3 roothub devices. > >> > >> 2.3, Call suspend callbacks of devices. > >> > >> 2.3.1, All suspend callbacks of all hcd's children, including > >> roothub devices are called. > >> > >> 2.3.2, Finally, hcd_pci_suspend callback is called. > >> > >> Due to workqueue threads were already frozen in step 1, the workqueue > >> items can't be scheduled, and the roothub devices can't be resumed in > >> this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in > >> usb_hcd_resume_root_hub won't be cleared. Finally, > >> hcd_pci_suspend will return -EBUSY, and system suspend fails. > > > > Hmmm. I don't recall seeing this problem occur with ehci-hcd. But > > then, I haven't tested it very much recently. > > > > We could change to a different work queue, one that doesn't get > > frozen. But there's no guarantee that the work items would run before > > your step 2.3.2. > > > > Maybe we can avoid step 2.1. I think there have been some recent > > changes to the PM code in this area. There may be a flag you can set > > that will prevent the PCI core from resuming the host controller. > > > > Or maybe we can change step 2.3.1, so that the root hub's suspend > > callback will first do a resume if the WAKEUP_PENDING flag is set. > > That might be the most reliable approach. > > > > I'm not sure I understand the last suggestion, could you open up how it > would work? Here's what I had in mind. See if you think this would work. Consider choose_wakeup() in core/driver.c. That subroutine gets called by usb_suspend() when step 2.3.1 wants to suspend a USB device. We could patch it as follows: > I started approaching this from another direction, mainly making sure we don't > immediately runtime suspend the host controller after resume. Adding a next_statechange > minimal time between resume and suspend, and checking for pending events in xhci_suspend(). > > I'll have some patches to test for russianneuromancer@ya.ru soon > > These are generic checks that ehci_suspend() also does To tell the truth, I'm not sure how necessary those next_statechange tests really are. Alan Stern --- To unsubscribe from this list: send the line "unsubscribe linux-usb" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --- usb-4.x.orig/drivers/usb/core/driver.c +++ usb-4.x/drivers/usb/core/driver.c @@ -1449,11 +1449,21 @@ static void choose_wakeup(struct usb_dev */ w = device_may_wakeup(&udev->dev); - /* If the device is autosuspended with the wrong wakeup setting, + /* + * If the device is autosuspended with the wrong wakeup setting, * autoresume now so the setting can be changed. + * + * Likewise, if the device is an autosuspended root hub and the + * hcd needs to wake it up before the controller can be suspended, + * resume it now to clear the WAKEUP_PENDING flag. */ - if (udev->state == USB_STATE_SUSPENDED && w != udev->do_remote_wakeup) - pm_runtime_resume(&udev->dev); + if (udev->state == USB_STATE_SUSPENDED) { + struct usb_hcd *hcd = bus_to_hcd(udev->bus); + + if (w != udev->do_remote_wakeup || + (!udev->parent && HCD_WAKEUP_PENDING(hcd))) + pm_runtime_resume(&udev->dev); + } udev->do_remote_wakeup = w; }