From patchwork Thu Jan 25 23:29:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lyude Paul X-Patchwork-Id: 10184885 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 2666660388 for ; Thu, 25 Jan 2018 23:30:21 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BFD828B36 for ; Thu, 25 Jan 2018 23:30:21 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1949128B3B; Thu, 25 Jan 2018 23:30:21 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2645E28B36 for ; Thu, 25 Jan 2018 23:30:19 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1E66E89C84; Thu, 25 Jan 2018 23:30:17 +0000 (UTC) X-Original-To: dri-devel@lists.freedesktop.org Delivered-To: dri-devel@lists.freedesktop.org Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by gabe.freedesktop.org (Postfix) with ESMTPS id A661189C84; Thu, 25 Jan 2018 23:30:15 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 2BC66883A0; Thu, 25 Jan 2018 23:30:15 +0000 (UTC) Received: from whitewolf.lyude.net.com (ovpn-120-204.rdu2.redhat.com [10.10.120.204]) by smtp.corp.redhat.com (Postfix) with ESMTP id C4DF860C4E; Thu, 25 Jan 2018 23:30:10 +0000 (UTC) From: Lyude Paul To: nouveau@lists.freedesktop.org Subject: [PATCH v3] drm/nouveau: Move irq setup/teardown to pci ctor/dtor Date: Thu, 25 Jan 2018 18:29:53 -0500 Message-Id: <20180125232955.14574-1-lyude@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Thu, 25 Jan 2018 23:30:15 +0000 (UTC) X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Karol Herbst , David Airlie , dri-devel@lists.freedesktop.org, Mike Galbraith , linux-kernel@vger.kernel.org, stable@vger.kernel.org, Ben Skeggs , Thomas Gleixner , Thierry Reding MIME-Version: 1.0 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" X-Virus-Scanned: ClamAV using ClamSMTP For a while we've been having issues with seemingly random interrupts coming from nvidia cards when resuming them. Originally the fix for this was thought to be just re-arming the MSI interrupt registers right after re-allocating our IRQs, however it seems a lot of what we do is both wrong and not even nessecary. This was made apparent by what appeared to be a regression in the mainline kernel that started introducing suspend/resume issues for nouveau: a0c9259dc4e1 (irq/matrix: Spread interrupts on allocation) After this commit was introduced, we started getting interrupts from the GPU before we actually re-allocated our own IRQ (see references below) and assigned the IRQ handler. Investigating this turned out that the problem was not with the commit, but the fact that nouveau even free/allocates it's irqs before and after suspend/resume. For starters: drivers in the linux kernel haven't had to handle freeing/re-allocating their IRQs during suspend/resume cycles for quite a while now. Nouveau seems to be one of the few drivers left that still does this, despite the fact there's no reason we actually need to since disabling interrupts from the device side should be enough, as the kernel is already smart enough to know to disable host-side interrupts for us before going into suspend. Since we were tearing down our IRQs by hand however, that means there was a short period during resume where interrupts could be received before we re-allocated our IRQ which would lead to us getting an unhandled IRQ. Since we never handle said IRQ and re-arm the interrupt registers, this would cause us to miss all of the interrupts from the GPU and cause our init process to start timing out on anything requiring interrupts. So, since this whole setup/teardown every suspend/resume cycle is useless anyway, move irq setup/teardown into the pci subdev's ctor/dtor functions instead so they're only called at driver load and driver unload. This should fix most of the issues with pending interrupts on resume, along with getting suspend/resume for nouveau to work again. As well, this probably means we can also just remove the msi rearm call inside nvkm_pci_init(). But since our main focus here is to fix suspend/resume before 4.15, we'll save that for a later patch. Signed-off-by: Lyude Paul Cc: Karol Herbst Cc: Thomas Gleixner Cc: Mike Galbraith Cc: stable@vger.kernel.org --- Changes since v2: - Remove teardown, just reuse pci->irq to indicate when we're tearing down the driver drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c | 44 +++++++++++++++++--------- 1 file changed, 29 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c index deb96de54b00..3b2cad639388 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c @@ -71,6 +71,10 @@ nvkm_pci_intr(int irq, void *arg) struct nvkm_pci *pci = arg; struct nvkm_device *device = pci->subdev.device; bool handled = false; + + if (pci->irq < 0) + return IRQ_HANDLED; + nvkm_mc_intr_unarm(device); if (pci->msi) pci->func->msi_rearm(pci); @@ -84,11 +88,6 @@ nvkm_pci_fini(struct nvkm_subdev *subdev, bool suspend) { struct nvkm_pci *pci = nvkm_pci(subdev); - if (pci->irq >= 0) { - free_irq(pci->irq, pci); - pci->irq = -1; - } - if (pci->agp.bridge) nvkm_agp_fini(pci); @@ -108,8 +107,20 @@ static int nvkm_pci_oneinit(struct nvkm_subdev *subdev) { struct nvkm_pci *pci = nvkm_pci(subdev); - if (pci_is_pcie(pci->pdev)) - return nvkm_pcie_oneinit(pci); + struct pci_dev *pdev = pci->pdev; + int ret; + + if (pci_is_pcie(pci->pdev)) { + ret = nvkm_pcie_oneinit(pci); + if (ret) + return ret; + } + + ret = request_irq(pdev->irq, nvkm_pci_intr, IRQF_SHARED, "nvkm", pci); + if (ret) + return ret; + pci->irq = pdev->irq; + return 0; } @@ -117,7 +128,6 @@ static int nvkm_pci_init(struct nvkm_subdev *subdev) { struct nvkm_pci *pci = nvkm_pci(subdev); - struct pci_dev *pdev = pci->pdev; int ret; if (pci->agp.bridge) { @@ -131,28 +141,32 @@ nvkm_pci_init(struct nvkm_subdev *subdev) if (pci->func->init) pci->func->init(pci); - ret = request_irq(pdev->irq, nvkm_pci_intr, IRQF_SHARED, "nvkm", pci); - if (ret) - return ret; - - pci->irq = pdev->irq; - /* Ensure MSI interrupts are armed, for the case where there are * already interrupts pending (for whatever reason) at load time. */ if (pci->msi) pci->func->msi_rearm(pci); - return ret; + return 0; } static void * nvkm_pci_dtor(struct nvkm_subdev *subdev) { struct nvkm_pci *pci = nvkm_pci(subdev); + int irq; + nvkm_agp_dtor(pci); + + if (pci->irq >= 0) { + irq = pci->irq; + pci->irq = -1; + free_irq(irq, pci); + } + if (pci->msi) pci_disable_msi(pci->pdev); + return nvkm_pci(subdev); }