[RFC,1/2] PCI: pciehp: Merge hotplug work requests

Some oddball devices may experience a PCIe link flap after power-on.
This may result in the following sequence of events.

fpc0 kernel: pciehp 0000:02:08.0:pcie24: Card present on Slot(0)
fpc0 kernel: pciehp 0000:02:08.0:pcie24: slot(0): Link Up event
fpc0 kernel: pciehp 0000:02:08.0:pcie24:
	Link Up event ignored on slot(0): already powering on
fpc0 kernel: pciehp 0000:02:08.0:pcie24: slot(0): Link Down event
fpc0 kernel: pciehp 0000:02:08.0:pcie24:
	Link Down event queued on slot(0): currently getting powered on
fpc0 kernel: pciehp 0000:02:08.0:pcie24: slot(0): Link Up event
fpc0 kernel: pciehp 0000:02:08.0:pcie24:
	Link Up event queued on slot(0): currently getting powered off

This causes the driver for affected devices to be instantiated, removed,
and re-instantiated.

An even worse problem is a device which resets itself continuously.
This can result in the following endless sequence of messages.

pciehp 0000:02:0a.0:pcie24: Card present on Slot(10)
pciehp 0000:02:0a.0:pcie24: Card not present on Slot(10)
pciehp 0000:02:0a.0:pcie24: Card present on Slot(10)
pciehp 0000:02:0a.0:pcie24: Card not present on Slot(10)
pciehp 0000:02:0a.0:pcie24: Card present on Slot(10)
pciehp 0000:02:0a.0:pcie24: Card not present on Slot(10)

The problem in the both cases is that all events are enqueued as hotplug
work requests and executed in sequence, which can overwhelm the system
and even result in "hung task" tracebacks in pciehp_power_thread().

This exposes an underlying limitation of the hotplug state machine: It
executes all received requests, even though only the most recent request
really needs to be handled. As a result, by the time a request is handled,
it may well be obsolete and have been superseded by many other enable /
disable requests which have been enqueued in the meantime.

To solve the problem, fold hotplug work requests into a single request.
Store the request as well as the work data structure in 'struct slot',
thus eliminating the need to allocate memory for each request.
Handle a sequence of requests by setting a 'disable' flag when needed,
indicating that a link needs to be disabled prior to re-enabling it.

With this change, requests and request sequences are handled as follows.

enable (single request):		enable link
disable (single request):		disable link
... disable-enable-disable...disable:	disable link
... disable-enable-disable...enable:	disable link, then enable it

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
This is a different approach to solve the problem I tried to address
earlier with with "PCI: pciehp: Add support for delayed power-on" [1].

While the earlier patch implemented an additional state in the hotplug
state machine to solve the problem, the approach taken here is a bit
simpler and more straightfoward. By folding multiple requests into one,
the follow-up patch can use delayed work to implement power-on delays.
An additional advantage is that this patch can be applied separately
to simplify the state machine.

While working on this patch, I also tried to drop multiple "disable"
requests, and only disable a slot if it was already disabled, to reduce
overhead. Unfortunately, this did not always work. In some instances,
I ended up in a situation where a device could not be enabled
because the system thought that it already existed. I don't know
if this is a weakness in this patch or some state change I did not catch. 
This may be left for further study.

[1] https://lkml.org/lkml/2015/10/12/686

 drivers/pci/hotplug/pciehp.h      |  4 +++
 drivers/pci/hotplug/pciehp_ctrl.c | 52 ++++++++++++++++++---------------------
 drivers/pci/hotplug/pciehp_hpc.c  |  1 +
 3 files changed, 29 insertions(+), 28 deletions(-)

Message ID	1446522496-21628-1-git-send-email-linux@roeck-us.net (mailing list archive)
State	New, archived
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> X-Original-To: patchwork-linux-pci@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id A5241BEEA4 for <patchwork-linux-pci@patchwork.kernel.org>; Tue, 3 Nov 2015 03:48:26 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 7879C2072B for <patchwork-linux-pci@patchwork.kernel.org>; Tue, 3 Nov 2015 03:48:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 041A52073E for <patchwork-linux-pci@patchwork.kernel.org>; Tue, 3 Nov 2015 03:48:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754784AbbKCDsW (ORCPT <rfc822;patchwork-linux-pci@patchwork.kernel.org>); Mon, 2 Nov 2015 22:48:22 -0500 Received: from bh-25.webhostbox.net ([208.91.199.152]:58348 "EHLO bh-25.webhostbox.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752049AbbKCDsV (ORCPT <rfc822;linux-pci@vger.kernel.org>); Mon, 2 Nov 2015 22:48:21 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=roeck-us.net; s=default; h=Message-Id:Date:Subject:Cc:To:From; bh=XBIf3QltTUkWhsesjW0hMeLAqYG6KEllhFByS//bFY8=; b=qxZ0bAOMK5NRTlRMyM3gr2z62/8qfEWBW7+khSk+SHAAc2jxk/NC2aUdlZBpdu1Q2pD+ccYPbZ70ZUbkh+MnBI24joB7no0YZTTa8ccqaHGqSJybd5LrAcBTNXAMaukNI+cn/vP4Vr3zGAesbzVX3hqPtO99Fdbu9XrXBhxYfuQ=; Received: from 108-223-40-66.lightspeed.sntcca.sbcglobal.net ([108.223.40.66]:37060 helo=localhost) by bh-25.webhostbox.net with esmtpa (Exim 4.85) (envelope-from <linux@roeck-us.net>) id 1ZtSa7-0024ke-Fl; Tue, 03 Nov 2015 03:48:24 +0000 From: Guenter Roeck <linux@roeck-us.net> To: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org>, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org, Guenter Roeck <linux@roeck-us.net> Subject: [RFC PATCH 1/2] PCI: pciehp: Merge hotplug work requests Date: Mon, 2 Nov 2015 19:48:15 -0800 Message-Id: <1446522496-21628-1-git-send-email-linux@roeck-us.net> X-Mailer: git-send-email 2.1.4 X-Authenticated_sender: guenter@roeck-us.net X-OutGoing-Spam-Status: No, score=-1.0 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - bh-25.webhostbox.net X-AntiAbuse: Original Domain - vger.kernel.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - roeck-us.net X-Get-Message-Sender-Via: bh-25.webhostbox.net: authenticated_id: guenter@roeck-us.net X-Source: X-Source-Args: X-Source-Dir: Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: <linux-pci.vger.kernel.org> X-Mailing-List: linux-pci@vger.kernel.org X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_HI,T_DKIM_INVALID,T_RP_MATCHES_RCVD,UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP

[RFC,1/2] PCI: pciehp: Merge hotplug work requests

Commit Message

Comments

Patch