From patchwork Sat Apr 15 17:01:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 9682317 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id C5D2660132 for ; Sat, 15 Apr 2017 17:33:08 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B9B3626E94 for ; Sat, 15 Apr 2017 17:33:08 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id AEB122846F; Sat, 15 Apr 2017 17:33:08 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 678E826E94 for ; Sat, 15 Apr 2017 17:33:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755948AbdDORcY (ORCPT ); Sat, 15 Apr 2017 13:32:24 -0400 Received: from Galois.linutronix.de ([146.0.238.70]:44529 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755567AbdDORbc (ORCPT ); Sat, 15 Apr 2017 13:31:32 -0400 Received: from localhost ([127.0.0.1] helo=[127.0.1.1]) by Galois.linutronix.de with esmtp (Exim 4.80) (envelope-from ) id 1czRWX-00085v-OX; Sat, 15 Apr 2017 19:30:13 +0200 Message-Id: <20170415171652.675049942@linutronix.de> User-Agent: quilt/0.63-1 Date: Sat, 15 Apr 2017 19:01:24 +0200 From: Thomas Gleixner To: LKML Cc: Peter Zijlstra , Ingo Molnar , Steven Rostedt , Sebastian Siewior , Bjorn Helgaas , linux-pci@vger.kernel.org Subject: [patch 17/20] PCI: Use cpu_hotplug_disable() instead of get_online_cpus() References: <20170415170107.643253702@linutronix.de> MIME-Version: 1.0 Content-Disposition: inline; filename=PCI--Use-cpu_hotplug_disable()-instead-of-get_online_cpus().patch Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem unearthed a circular lock dependency which was hidden from lockdep due to the lockdep annotation of get_online_cpus() which prevents lockdep from creating full dependency chains. There are several variants of this. And example is: Chain exists of: cpu_hotplug_lock.rw_sem --> drm_global_mutex --> &item->mutex CPU0 CPU1 ---- ---- lock(&item->mutex); lock(drm_global_mutex); lock(&item->mutex); lock(cpu_hotplug_lock.rw_sem); because there are dependencies through workqueues. The call chain is: get_online_cpus apply_workqueue_attrs __alloc_workqueue_key ttm_mem_global_init ast_ttm_mem_global_init drm_global_item_ref ast_mm_init ast_driver_load drm_dev_register drm_get_pci_dev ast_pci_probe local_pci_probe work_for_cpu_fn process_one_work worker_thread This is not a problem of get_online_cpus() recursion, it's a possible deadlock undetected by lockdep so far. The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to protect the PCI probing. There is a side effect to this: cpu_hotplug_disable() makes a concurrent cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but PCI probing usually happens during the boot process where no interaction is possible. Any later invocations are infrequent enough and concurrent hotplug attempts are so unlikely that the danger of user space visible regressions is very close to zero. Anyway, thats preferrable over a real deadlock. Signed-off-by: Thomas Gleixner Cc: Bjorn Helgaas Cc: linux-pci@vger.kernel.org Acked-by: Bjorn Helgaas --- drivers/pci/pci-driver.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -320,10 +320,19 @@ static long local_pci_probe(void *_ddi) return 0; } +static bool pci_physfn_is_probed(struct pci_dev *dev) +{ +#ifdef CONFIG_ATS + return dev->physfn->is_probed; +#else + return false; +#endif +} + static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev, const struct pci_device_id *id) { - int error, node; + int error, node, cpu; struct drv_dev_and_id ddi = { drv, dev, id }; /* @@ -349,13 +358,13 @@ static int pci_call_probe(struct pci_dri if (node >= 0 && node != numa_node_id()) { int cpu; - get_online_cpus(); + cpu_hotplug_disable(); cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask); if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); - put_online_cpus(); + cpu_hotplug_enable(); } else error = local_pci_probe(&ddi);