From patchwork Thu Jan 3 22:50:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10747831 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EEA2E13B5 for ; Thu, 3 Jan 2019 22:52:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DFAC82237D for ; Thu, 3 Jan 2019 22:52:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D114F24BFE; Thu, 3 Jan 2019 22:52:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D69B2237D for ; Thu, 3 Jan 2019 22:52:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728572AbfACWwk (ORCPT ); Thu, 3 Jan 2019 17:52:40 -0500 Received: from mga11.intel.com ([192.55.52.93]:37894 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726529AbfACWwj (ORCPT ); Thu, 3 Jan 2019 17:52:39 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jan 2019 14:52:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,436,1539673200"; d="scan'208";a="307303365" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga006.fm.intel.com with ESMTP; 03 Jan 2019 14:52:39 -0800 From: Keith Busch To: Jens Axboe , Christoph Hellwig , Sagi Grimberg , Ming Lei , linux-nvme@lists.infradead.org, Bjorn Helgaas , linux-pci@vger.kernel.org Cc: Keith Busch Subject: [PATCHv2 1/4] nvme-pci: Set tagset nr_maps just once Date: Thu, 3 Jan 2019 15:50:30 -0700 Message-Id: <20190103225033.11249-2-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190103225033.11249-1-keith.busch@intel.com> References: <20190103225033.11249-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The driver overwrites the intermediate nr_map assignments to HCTX_MAX_TYPES, so remove those unnecessary temporary settings. Signed-off-by: Keith Busch Reviewed-by: Ming Lei --- drivers/nvme/host/pci.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 5a0bf6a24d50..98332d0a80f0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2291,9 +2291,6 @@ static int nvme_dev_add(struct nvme_dev *dev) if (!dev->ctrl.tagset) { dev->tagset.ops = &nvme_mq_ops; dev->tagset.nr_hw_queues = dev->online_queues - 1; - dev->tagset.nr_maps = 2; /* default + read */ - if (dev->io_queues[HCTX_TYPE_POLL]) - dev->tagset.nr_maps++; dev->tagset.nr_maps = HCTX_MAX_TYPES; dev->tagset.timeout = NVME_IO_TIMEOUT; dev->tagset.numa_node = dev_to_node(dev->dev); From patchwork Thu Jan 3 22:50:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10747833 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E56613B5 for ; Thu, 3 Jan 2019 22:52:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F29392237D for ; Thu, 3 Jan 2019 22:52:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E6C0324BFE; Thu, 3 Jan 2019 22:52:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 810A72237D for ; Thu, 3 Jan 2019 22:52:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728584AbfACWwv (ORCPT ); Thu, 3 Jan 2019 17:52:51 -0500 Received: from mga18.intel.com ([134.134.136.126]:45205 "EHLO mga18.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726529AbfACWwu (ORCPT ); Thu, 3 Jan 2019 17:52:50 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga106.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jan 2019 14:52:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,436,1539673200"; d="scan'208";a="307303389" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga006.fm.intel.com with ESMTP; 03 Jan 2019 14:52:49 -0800 From: Keith Busch To: Jens Axboe , Christoph Hellwig , Sagi Grimberg , Ming Lei , linux-nvme@lists.infradead.org, Bjorn Helgaas , linux-pci@vger.kernel.org Cc: Keith Busch Subject: [PATCHv2 2/4] nvme-pci: Distribute io queue types after creation Date: Thu, 3 Jan 2019 15:50:31 -0700 Message-Id: <20190103225033.11249-3-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190103225033.11249-1-keith.busch@intel.com> References: <20190103225033.11249-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The dev->io_queues types were set based on the results of the nvme set feature "number of queues" and the IRQ allocation. This result does not mean we're going to successfully allocate and create those IO queues, though. A failure there will cause blk-mq to have NULL hctx's because the map's nr_hw_queues accounts for more queues than were actually created. Adjust the io_queue types after we've created them when we have less than originally desired. Fixes: 3b6592f70ad7b ("nvme: utilize two queue maps, one for reads and one for writes") Signed-off-by: Keith Busch Reviewed-by: Ming Lei --- drivers/nvme/host/pci.c | 46 ++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 98332d0a80f0..1481bb6d9c42 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1733,6 +1733,30 @@ static int nvme_pci_configure_admin_queue(struct nvme_dev *dev) return result; } +static void nvme_distribute_queues(struct nvme_dev *dev, unsigned int io_queues) +{ + unsigned int irq_queues, this_p_queues = dev->io_queues[HCTX_TYPE_POLL], + this_w_queues = dev->io_queues[HCTX_TYPE_DEFAULT]; + + if (!io_queues) { + dev->io_queues[HCTX_TYPE_POLL] = 0; + dev->io_queues[HCTX_TYPE_DEFAULT] = 0; + dev->io_queues[HCTX_TYPE_READ] = 0; + return; + } + + if (this_p_queues >= io_queues) + this_p_queues = io_queues - 1; + irq_queues = io_queues - this_p_queues; + + if (this_w_queues > irq_queues) + this_w_queues = irq_queues; + + dev->io_queues[HCTX_TYPE_POLL] = this_p_queues; + dev->io_queues[HCTX_TYPE_DEFAULT] = this_w_queues; + dev->io_queues[HCTX_TYPE_READ] = irq_queues - this_w_queues; +} + static int nvme_create_io_queues(struct nvme_dev *dev) { unsigned i, max, rw_queues; @@ -1761,6 +1785,13 @@ static int nvme_create_io_queues(struct nvme_dev *dev) break; } + /* + * If we've created less than expected io queues, redistribute the + * dev->io_queues[] types accordingly. + */ + if (dev->online_queues - 1 != dev->max_qid) + nvme_distribute_queues(dev, dev->online_queues - 1); + /* * Ignore failing Create SQ/CQ commands, we can continue with less * than the desired amount of queues, and even a controller without @@ -2185,11 +2216,6 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) result = max(result - 1, 1); dev->max_qid = result + dev->io_queues[HCTX_TYPE_POLL]; - dev_info(dev->ctrl.device, "%d/%d/%d default/read/poll queues\n", - dev->io_queues[HCTX_TYPE_DEFAULT], - dev->io_queues[HCTX_TYPE_READ], - dev->io_queues[HCTX_TYPE_POLL]); - /* * Should investigate if there's a performance win from allocating * more queues than interrupt vectors; it might allow the submission @@ -2203,7 +2229,15 @@ static int nvme_setup_io_queues(struct nvme_dev *dev) return result; } set_bit(NVMEQ_ENABLED, &adminq->flags); - return nvme_create_io_queues(dev); + result = nvme_create_io_queues(dev); + + if (!result) + dev_info(dev->ctrl.device, "%d/%d/%d default/read/poll queues\n", + dev->io_queues[HCTX_TYPE_DEFAULT], + dev->io_queues[HCTX_TYPE_READ], + dev->io_queues[HCTX_TYPE_POLL]); + return result; + } static void nvme_del_queue_end(struct request *req, blk_status_t error) From patchwork Thu Jan 3 22:50:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10747835 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 40A5B14DE for ; Thu, 3 Jan 2019 22:53:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2F6362237D for ; Thu, 3 Jan 2019 22:53:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 23C6524BFE; Thu, 3 Jan 2019 22:53:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 929092237D for ; Thu, 3 Jan 2019 22:53:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728683AbfACWxC (ORCPT ); Thu, 3 Jan 2019 17:53:02 -0500 Received: from mga02.intel.com ([134.134.136.20]:61366 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726529AbfACWxB (ORCPT ); Thu, 3 Jan 2019 17:53:01 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jan 2019 14:53:01 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,436,1539673200"; d="scan'208";a="307303409" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga006.fm.intel.com with ESMTP; 03 Jan 2019 14:53:00 -0800 From: Keith Busch To: Jens Axboe , Christoph Hellwig , Sagi Grimberg , Ming Lei , linux-nvme@lists.infradead.org, Bjorn Helgaas , linux-pci@vger.kernel.org Cc: Keith Busch Subject: [PATCHv2 3/4] PCI/MSI: Handle vector reduce and retry Date: Thu, 3 Jan 2019 15:50:32 -0700 Message-Id: <20190103225033.11249-4-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190103225033.11249-1-keith.busch@intel.com> References: <20190103225033.11249-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The struct irq_affinity nr_sets forced the driver to handle reducing the vector count on allocation failures because the set distribution counts are driver specific. The change to this API requires very different usage than before, and introduced new error corner cases that weren't being handled. It is also less efficient since the driver doesn't actually know what a proper vector count it should use since it only sees the error code and can only reduce by one instead of going straight to a possible vector count like PCI is able to do. Provide a driver specific callback for managed irq set creation so that PCI can take a min and max vectors as before to handle the reduce and retry logic. The usage is not particularly obvious for this new feature, so append documentation for driver usage. Signed-off-by: Keith Busch --- Documentation/PCI/MSI-HOWTO.txt | 36 +++++++++++++++++++++++++++++++++++- drivers/pci/msi.c | 20 ++++++-------------- include/linux/interrupt.h | 5 +++++ 3 files changed, 46 insertions(+), 15 deletions(-) diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt index 618e13d5e276..391b1f369138 100644 --- a/Documentation/PCI/MSI-HOWTO.txt +++ b/Documentation/PCI/MSI-HOWTO.txt @@ -98,7 +98,41 @@ The flags argument is used to specify which type of interrupt can be used by the device and the driver (PCI_IRQ_LEGACY, PCI_IRQ_MSI, PCI_IRQ_MSIX). A convenient short-hand (PCI_IRQ_ALL_TYPES) is also available to ask for any possible kind of interrupt. If the PCI_IRQ_AFFINITY flag is set, -pci_alloc_irq_vectors() will spread the interrupts around the available CPUs. +pci_alloc_irq_vectors() will spread the interrupts around the available +CPUs. Vector affinities allocated under the PCI_IRQ_AFFINITY flag are +managed by the kernel, and are not tunable from user space like other +vectors. + +When your driver requires a more complex vector affinity configuration +than a default spread of all vectors, the driver may use the following +function: + + int pci_alloc_irq_vectors_affinity(struct pci_dev *dev, unsigned int min_vecs, + unsigned int max_vecs, unsigned int flags, + const struct irq_affinity *affd); + +The 'struct irq_affinity *affd' allows a driver to specify additional +characteristics for how a driver wants the vector management to occur. The +'pre_vectors' and 'post_vectors' fields define how many vectors the driver +wants to not participate in kernel managed affinities, and whether those +special vectors are at the beginning or the end of the vector space. + +It may also be the case that a driver wants multiple sets of fully +affinitized vectors. For example, a single PCI function may provide +different high performance services that want full CPU affinity for each +service independent of other services. In this case, the driver may use +the struct irq_affinity's 'nr_sets' field to specify how many groups of +vectors need to be spread across all the CPUs, and fill in the 'sets' +array to say how many vectors the driver wants in each set. + +When using multiple affinity 'sets', the error handling for vector +reduction and retry becomes more complicated since the PCI core +doesn't know how to redistribute the vector count across the sets. In +order to provide this error handling, the driver must also provide the +'recalc_sets()' callback and set the 'priv' data needed for the driver +specific vector distribution. The driver's callback is responsible to +ensure the sum of the vector counts across its sets matches the new +vector count that PCI can allocate. To get the Linux IRQ numbers passed to request_irq() and free_irq() and the vectors, use the following function: diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c index 7a1c8a09efa5..b93ac49be18d 100644 --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -1035,13 +1035,6 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, if (maxvec < minvec) return -ERANGE; - /* - * If the caller is passing in sets, we can't support a range of - * vectors. The caller needs to handle that. - */ - if (affd && affd->nr_sets && minvec != maxvec) - return -EINVAL; - if (WARN_ON_ONCE(dev->msi_enabled)) return -EINVAL; @@ -1061,6 +1054,9 @@ static int __pci_enable_msi_range(struct pci_dev *dev, int minvec, int maxvec, return -ENOSPC; } + if (nvec != maxvec && affd && affd->recalc_sets) + affd->recalc_sets((struct irq_affinity *)affd, nvec); + rc = msi_capability_init(dev, nvec, affd); if (rc == 0) return nvec; @@ -1093,13 +1089,6 @@ static int __pci_enable_msix_range(struct pci_dev *dev, if (maxvec < minvec) return -ERANGE; - /* - * If the caller is passing in sets, we can't support a range of - * supported vectors. The caller needs to handle that. - */ - if (affd && affd->nr_sets && minvec != maxvec) - return -EINVAL; - if (WARN_ON_ONCE(dev->msix_enabled)) return -EINVAL; @@ -1110,6 +1099,9 @@ static int __pci_enable_msix_range(struct pci_dev *dev, return -ENOSPC; } + if (nvec != maxvec && affd && affd->recalc_sets) + affd->recalc_sets((struct irq_affinity *)affd, nvec); + rc = __pci_enable_msix(dev, entries, nvec, affd); if (rc == 0) return nvec; diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index c672f34235e7..01c06829ff43 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -249,12 +249,17 @@ struct irq_affinity_notify { * the MSI(-X) vector space * @nr_sets: Length of passed in *sets array * @sets: Number of affinitized sets + * @recalc_sets: Recalculate sets if the previously requested allocation + * failed + * @priv: Driver private data */ struct irq_affinity { int pre_vectors; int post_vectors; int nr_sets; int *sets; + void (*recalc_sets)(struct irq_affinity *, unsigned int); + void *priv; }; /** From patchwork Thu Jan 3 22:50:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 10747837 X-Patchwork-Delegate: bhelgaas@google.com Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F76C13B5 for ; Thu, 3 Jan 2019 22:53:16 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5EBF12237D for ; Thu, 3 Jan 2019 22:53:16 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 52DC324BFE; Thu, 3 Jan 2019 22:53:16 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D8B442237D for ; Thu, 3 Jan 2019 22:53:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728658AbfACWxP (ORCPT ); Thu, 3 Jan 2019 17:53:15 -0500 Received: from mga06.intel.com ([134.134.136.31]:38047 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726529AbfACWxP (ORCPT ); Thu, 3 Jan 2019 17:53:15 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 03 Jan 2019 14:53:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,436,1539673200"; d="scan'208";a="307303460" Received: from unknown (HELO localhost.lm.intel.com) ([10.232.112.69]) by fmsmga006.fm.intel.com with ESMTP; 03 Jan 2019 14:53:13 -0800 From: Keith Busch To: Jens Axboe , Christoph Hellwig , Sagi Grimberg , Ming Lei , linux-nvme@lists.infradead.org, Bjorn Helgaas , linux-pci@vger.kernel.org Cc: Keith Busch Subject: [PATCHv2 4/4] nvme-pci: Use PCI to handle IRQ reduce and retry Date: Thu, 3 Jan 2019 15:50:33 -0700 Message-Id: <20190103225033.11249-5-keith.busch@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190103225033.11249-1-keith.busch@intel.com> References: <20190103225033.11249-1-keith.busch@intel.com> Sender: linux-pci-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pci@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Restore error handling for vector allocation back to the PCI core. Signed-off-by: Keith Busch --- drivers/nvme/host/pci.c | 77 ++++++++++++++----------------------------------- 1 file changed, 21 insertions(+), 56 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 1481bb6d9c42..f3ef09a8e8f9 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2059,37 +2059,43 @@ static int nvme_setup_host_mem(struct nvme_dev *dev) return ret; } -static void nvme_calc_io_queues(struct nvme_dev *dev, unsigned int irq_queues) +static void nvme_calc_io_queues(struct irq_affinity *affd, unsigned int nvecs) { + struct nvme_dev *dev = affd->priv; unsigned int this_w_queues = write_queues; /* * Setup read/write queue split */ - if (irq_queues == 1) { + if (nvecs == 1) { dev->io_queues[HCTX_TYPE_DEFAULT] = 1; dev->io_queues[HCTX_TYPE_READ] = 0; - return; + goto set_sets; } /* * If 'write_queues' is set, ensure it leaves room for at least * one read queue */ - if (this_w_queues >= irq_queues) - this_w_queues = irq_queues - 1; + if (this_w_queues >= nvecs - 1) + this_w_queues = nvecs - 1; /* * If 'write_queues' is set to zero, reads and writes will share * a queue set. */ if (!this_w_queues) { - dev->io_queues[HCTX_TYPE_DEFAULT] = irq_queues; + dev->io_queues[HCTX_TYPE_DEFAULT] = nvecs - 1; dev->io_queues[HCTX_TYPE_READ] = 0; } else { dev->io_queues[HCTX_TYPE_DEFAULT] = this_w_queues; - dev->io_queues[HCTX_TYPE_READ] = irq_queues - this_w_queues; + dev->io_queues[HCTX_TYPE_READ] = nvecs - this_w_queues - 1; } +set_sets: + affd->sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT]; + affd->sets[1] = dev->io_queues[HCTX_TYPE_READ]; + if (!affd->sets[1]) + affd->nr_sets = 1; } static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues) @@ -2100,9 +2106,10 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues) .pre_vectors = 1, .nr_sets = ARRAY_SIZE(irq_sets), .sets = irq_sets, + .recalc_sets = nvme_calc_io_queues, + .priv = dev, }; - int result = 0; - unsigned int irq_queues, this_p_queues; + unsigned int nvecs, this_p_queues; /* * Poll queues don't need interrupts, but we need at least one IO @@ -2111,56 +2118,14 @@ static int nvme_setup_irqs(struct nvme_dev *dev, unsigned int nr_io_queues) this_p_queues = poll_queues; if (this_p_queues >= nr_io_queues) { this_p_queues = nr_io_queues - 1; - irq_queues = 1; + nvecs = 2; } else { - irq_queues = nr_io_queues - this_p_queues; + nvecs = nr_io_queues - this_p_queues + 1; } dev->io_queues[HCTX_TYPE_POLL] = this_p_queues; - - /* - * For irq sets, we have to ask for minvec == maxvec. This passes - * any reduction back to us, so we can adjust our queue counts and - * IRQ vector needs. - */ - do { - nvme_calc_io_queues(dev, irq_queues); - irq_sets[0] = dev->io_queues[HCTX_TYPE_DEFAULT]; - irq_sets[1] = dev->io_queues[HCTX_TYPE_READ]; - if (!irq_sets[1]) - affd.nr_sets = 1; - - /* - * If we got a failure and we're down to asking for just - * 1 + 1 queues, just ask for a single vector. We'll share - * that between the single IO queue and the admin queue. - */ - if (result >= 0 && irq_queues > 1) - irq_queues = irq_sets[0] + irq_sets[1] + 1; - - result = pci_alloc_irq_vectors_affinity(pdev, irq_queues, - irq_queues, - PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); - - /* - * Need to reduce our vec counts. If we get ENOSPC, the - * platform should support mulitple vecs, we just need - * to decrease our ask. If we get EINVAL, the platform - * likely does not. Back down to ask for just one vector. - */ - if (result == -ENOSPC) { - irq_queues--; - if (!irq_queues) - return result; - continue; - } else if (result == -EINVAL) { - irq_queues = 1; - continue; - } else if (result <= 0) - return -EIO; - break; - } while (1); - - return result; + nvme_calc_io_queues(&affd, nvecs); + return pci_alloc_irq_vectors_affinity(pdev, affd.pre_vectors, nvecs, + PCI_IRQ_ALL_TYPES | PCI_IRQ_AFFINITY, &affd); } static int nvme_setup_io_queues(struct nvme_dev *dev)