From patchwork Fri Aug 23 05:00:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Akihiko Odaki X-Patchwork-Id: 13774604 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CAAB5C531DC for ; Fri, 23 Aug 2024 05:02:21 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1shMQM-0006W2-3m; Fri, 23 Aug 2024 01:01:22 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1shMQI-0006MX-Oo for qemu-devel@nongnu.org; Fri, 23 Aug 2024 01:01:18 -0400 Received: from mail-pj1-x1031.google.com ([2607:f8b0:4864:20::1031]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1shMQG-0008R3-GT for qemu-devel@nongnu.org; Fri, 23 Aug 2024 01:01:18 -0400 Received: by mail-pj1-x1031.google.com with SMTP id 98e67ed59e1d1-2d1daa2577bso1187810a91.2 for ; Thu, 22 Aug 2024 22:01:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1724389274; x=1724994074; darn=nongnu.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ANdtsuA/TGl02EfU5lGEDU+SzPTDlHokeLE9rDoObAs=; b=2F65V/6F8ouH7QpfN8NW8+wGPSvOSUXKbE8DEDUaZjq8fbKZQcfZFc0ExWsNIRraTb EMtGUE5uKZOTbJXXFoDoCr4oc7RYxA2Djg5ZJHFijRUuTYzZgj26T0o/72xPN7Xn/GV1 h4IVbXm/jibwvGAa/SS1+JWcu3QmkRqKEoP2RwZqOkwyK4Q4i1JF+wZGEiskyilR8FtH n7vbU5HzRpbFV1jLBeCJgw7DlKk1VnY0W9mheknwazOcw9ATbzj/VY/C1yx5Azg4vynx +EcnEqDEpIeqcLNGx6USzltWEP7lvLekWWjQeX01fKTqFr0TDF4vymvq9POqQZt0VwPh JgJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724389274; x=1724994074; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ANdtsuA/TGl02EfU5lGEDU+SzPTDlHokeLE9rDoObAs=; b=TEtx/nOGfRk/izXW0+yTnj6Jd5MHDUOACPTo63oddkdKf7pUhpL0/cnuL0xlzxoYYI b/re9BuT+h/G1TK34W+znuz+AE0HxrU50zX5Rw/FbwTuq4Gn7xe98y42Wf0n78jprYZb KNq0+DXFEnmYlLzK7G85V2tH1/zulvaUyDHRILNbkH4OQvOWdaJY94oM+SDmnHRAb3pU 5+HcNzh265d8HcNRN+jjz21o7ZyFE+zCf7nJUXx/ThXYGyNSyqHB48dSWyMWqQ7MXY+Y ZqeDkVeiyu81hiPrkckFlsqJBatXIe3a1YgKvsTCSNKs2i+AUCORj4+SGsTi+61fE3lk tdeQ== X-Gm-Message-State: AOJu0YwNiOnZT/U7DIlDfyhHy+qgIToM4MQhtTLgF9k1DFgtE/Ye+WwX 9RtZiI3DKteaRhsdkpM08osU+vW4TfgNYDJm4NFLRfRbZdCgu+Vl2pecd1sb8S8= X-Google-Smtp-Source: AGHT+IG1OlsQ4yjHStqsecRT9Nh8NUgyudxZ2pC214zgCF/d5qDvMKVeEm11AISUTvin2bsyj0YWcw== X-Received: by 2002:a17:90a:c70d:b0:2d3:cb16:c8e with SMTP id 98e67ed59e1d1-2d646d867bcmr810977a91.43.1724389274068; Thu, 22 Aug 2024 22:01:14 -0700 (PDT) Received: from localhost ([157.82.207.23]) by smtp.gmail.com with UTF8SMTPSA id 98e67ed59e1d1-2d5eba287e7sm5180465a91.34.2024.08.22.22.01.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 22 Aug 2024 22:01:13 -0700 (PDT) From: Akihiko Odaki Date: Fri, 23 Aug 2024 14:00:43 +0900 Subject: [PATCH for-9.2 v15 06/11] pcie_sriov: Reuse SR-IOV VF device instances MIME-Version: 1.0 Message-Id: <20240823-reuse-v15-6-eddcb960e289@daynix.com> References: <20240823-reuse-v15-0-eddcb960e289@daynix.com> In-Reply-To: <20240823-reuse-v15-0-eddcb960e289@daynix.com> To: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= , "Michael S. Tsirkin" , Marcel Apfelbaum , Alex Williamson , =?utf-8?q?C=C3=A9dric_Le_Goa?= =?utf-8?q?ter?= , Paolo Bonzini , =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= , Eduardo Habkost , Sriram Yagnaraman , Jason Wang , Keith Busch , Klaus Jensen , Markus Armbruster Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Akihiko Odaki X-Mailer: b4 0.14-dev-fd6e3 Received-SPF: none client-ip=2607:f8b0:4864:20::1031; envelope-from=akihiko.odaki@daynix.com; helo=mail-pj1-x1031.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Disable SR-IOV VF devices by reusing code to power down PCI devices instead of removing them when the guest requests to disable VFs. This allows to realize devices and report VF realization errors at PF realization time. Signed-off-by: Akihiko Odaki --- docs/pcie_sriov.txt | 8 ++-- include/hw/pci/pci.h | 5 --- include/hw/pci/pci_device.h | 15 +++++++ include/hw/pci/pcie_sriov.h | 6 +-- hw/net/igb.c | 13 ++++-- hw/nvme/ctrl.c | 24 +++++++---- hw/pci/pci.c | 2 +- hw/pci/pcie_sriov.c | 102 +++++++++++++++++++------------------------- 8 files changed, 95 insertions(+), 80 deletions(-) diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt index a47aad0bfab0..ab2142807f79 100644 --- a/docs/pcie_sriov.txt +++ b/docs/pcie_sriov.txt @@ -52,9 +52,11 @@ setting up a BAR for a VF. ... /* Add and initialize the SR/IOV capability */ - pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", - vf_devid, initial_vfs, total_vfs, - fun_offset, stride); + if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", + vf_devid, initial_vfs, total_vfs, + fun_offset, stride, errp)) { + return; + } /* Set up individual VF BARs (parameters as for normal BARs) */ pcie_sriov_pf_init_vf_bar( ... ) diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index fe04b4fafd04..14a869eeaa71 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -680,9 +680,4 @@ static inline void pci_irq_pulse(PCIDevice *pci_dev) MSIMessage pci_get_msi_message(PCIDevice *dev, int vector); void pci_set_enabled(PCIDevice *pci_dev, bool state); -static inline void pci_set_power(PCIDevice *pci_dev, bool state) -{ - pci_set_enabled(pci_dev, state); -} - #endif diff --git a/include/hw/pci/pci_device.h b/include/hw/pci/pci_device.h index f38fb3111954..1ff3ce94e25b 100644 --- a/include/hw/pci/pci_device.h +++ b/include/hw/pci/pci_device.h @@ -212,6 +212,21 @@ static inline uint16_t pci_get_bdf(PCIDevice *dev) return PCI_BUILD_BDF(pci_bus_num(pci_get_bus(dev)), dev->devfn); } +static inline void pci_set_power(PCIDevice *pci_dev, bool state) +{ + /* + * Don't change the enabled state of VFs when powering on/off the device. + * + * When powering on, VFs must not be enabled immediately but they must + * wait until the guest configures SR-IOV. + * When powering off, their corresponding PFs will be reset and disable + * VFs. + */ + if (!pci_is_vf(pci_dev)) { + pci_set_enabled(pci_dev, state); + } +} + uint16_t pci_requester_id(PCIDevice *dev); /* DMA access functions */ diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h index 450cbef6c201..70649236c18a 100644 --- a/include/hw/pci/pcie_sriov.h +++ b/include/hw/pci/pcie_sriov.h @@ -18,7 +18,6 @@ typedef struct PCIESriovPF { uint16_t num_vfs; /* Number of virtual functions created */ uint8_t vf_bar_type[PCI_NUM_REGIONS]; /* Store type for each VF bar */ - const char *vfname; /* Reference to the device type used for the VFs */ PCIDevice **vf; /* Pointer to an array of num_vfs VF devices */ } PCIESriovPF; @@ -27,10 +26,11 @@ typedef struct PCIESriovVF { uint16_t vf_number; /* Logical VF number of this function */ } PCIESriovVF; -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, const char *vfname, uint16_t vf_dev_id, uint16_t init_vfs, uint16_t total_vfs, - uint16_t vf_offset, uint16_t vf_stride); + uint16_t vf_offset, uint16_t vf_stride, + Error **errp); void pcie_sriov_pf_exit(PCIDevice *dev); /* Set up a VF bar in the SR/IOV bar area */ diff --git a/hw/net/igb.c b/hw/net/igb.c index b92bba402e0d..b6ca2f1b8aee 100644 --- a/hw/net/igb.c +++ b/hw/net/igb.c @@ -446,9 +446,16 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error **errp) pcie_ari_init(pci_dev, 0x150); - pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF, - IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, - IGB_VF_OFFSET, IGB_VF_STRIDE); + if (!pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, + TYPE_IGBVF, IGB_82576_VF_DEV_ID, + IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS, + IGB_VF_OFFSET, IGB_VF_STRIDE, + errp)) { + pcie_cap_exit(pci_dev); + igb_cleanup_msix(s); + msi_uninit(pci_dev); + return; + } pcie_sriov_pf_init_vf_bar(pci_dev, IGBVF_MMIO_BAR_IDX, PCI_BASE_ADDRESS_MEM_TYPE_64 | PCI_BASE_ADDRESS_MEM_PREFETCH, diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c index c6d4f61a47f9..e86ea2e7ce57 100644 --- a/hw/nvme/ctrl.c +++ b/hw/nvme/ctrl.c @@ -8271,7 +8271,8 @@ out: return pow2ceil(bar_size); } -static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset) +static bool nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset, + Error **errp) { uint16_t vf_dev_id = n->params.use_intel_id ? PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME; @@ -8280,12 +8281,17 @@ static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset) le16_to_cpu(cap->vifrsm), NULL, NULL); - pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, - n->params.sriov_max_vfs, n->params.sriov_max_vfs, - NVME_VF_OFFSET, NVME_VF_STRIDE); + if (!pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id, + n->params.sriov_max_vfs, n->params.sriov_max_vfs, + NVME_VF_OFFSET, NVME_VF_STRIDE, + errp)) { + return false; + } pcie_sriov_pf_init_vf_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_TYPE_64, bar_size); + + return true; } static int nvme_add_pm_capability(PCIDevice *pci_dev, uint8_t offset) @@ -8410,6 +8416,12 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) return false; } + if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs && + !nvme_init_sriov(n, pci_dev, 0x120, errp)) { + msix_uninit(pci_dev, &n->bar0, &n->bar0); + return false; + } + nvme_update_msixcap_ts(pci_dev, n->conf_msix_qsize); pcie_cap_deverr_init(pci_dev); @@ -8439,10 +8451,6 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice *pci_dev, Error **errp) nvme_init_pmr(n, pci_dev); } - if (!pci_is_vf(pci_dev) && n->params.sriov_max_vfs) { - nvme_init_sriov(n, pci_dev, 0x120); - } - return true; } diff --git a/hw/pci/pci.c b/hw/pci/pci.c index b532888e8f6c..5c0050e1786a 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2895,7 +2895,7 @@ void pci_set_enabled(PCIDevice *d, bool state) memory_region_set_enabled(&d->bus_master_enable_region, (pci_get_word(d->config + PCI_COMMAND) & PCI_COMMAND_MASTER) && d->enabled); - if (!d->enabled) { + if (d->qdev.realized) { pci_device_reset(d); } } diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c index 499becd5273f..4bffe6c97f66 100644 --- a/hw/pci/pcie_sriov.c +++ b/hw/pci/pcie_sriov.c @@ -20,15 +20,25 @@ #include "qapi/error.h" #include "trace.h" -static PCIDevice *register_vf(PCIDevice *pf, int devfn, - const char *name, uint16_t vf_num); -static void unregister_vfs(PCIDevice *dev); +static void unparent_vfs(PCIDevice *dev, uint16_t total_vfs) +{ + for (uint16_t i = 0; i < total_vfs; i++) { + PCIDevice *vf = dev->exp.sriov_pf.vf[i]; + object_unparent(OBJECT(vf)); + object_unref(OBJECT(vf)); + } + g_free(dev->exp.sriov_pf.vf); + dev->exp.sriov_pf.vf = NULL; +} -void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, +bool pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, const char *vfname, uint16_t vf_dev_id, uint16_t init_vfs, uint16_t total_vfs, - uint16_t vf_offset, uint16_t vf_stride) + uint16_t vf_offset, uint16_t vf_stride, + Error **errp) { + BusState *bus = qdev_get_parent_bus(&dev->qdev); + int32_t devfn = dev->devfn + vf_offset; uint8_t *cfg = dev->config + offset; uint8_t *wmask; @@ -36,7 +46,6 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, offset, PCI_EXT_CAP_SRIOV_SIZEOF); dev->exp.sriov_cap = offset; dev->exp.sriov_pf.num_vfs = 0; - dev->exp.sriov_pf.vfname = g_strdup(vfname); dev->exp.sriov_pf.vf = NULL; pci_set_word(cfg + PCI_SRIOV_VF_OFFSET, vf_offset); @@ -69,13 +78,35 @@ void pcie_sriov_pf_init(PCIDevice *dev, uint16_t offset, pci_set_word(wmask + PCI_SRIOV_SYS_PGSIZE, 0x553); qdev_prop_set_bit(&dev->qdev, "multifunction", true); + + dev->exp.sriov_pf.vf = g_new(PCIDevice *, total_vfs); + + for (uint16_t i = 0; i < total_vfs; i++) { + PCIDevice *vf = pci_new(devfn, vfname); + vf->exp.sriov_vf.pf = dev; + vf->exp.sriov_vf.vf_number = i; + + if (!qdev_realize(&vf->qdev, bus, errp)) { + unparent_vfs(dev, i); + return false; + } + + /* set vid/did according to sr/iov spec - they are not used */ + pci_config_set_vendor_id(vf->config, 0xffff); + pci_config_set_device_id(vf->config, 0xffff); + + dev->exp.sriov_pf.vf[i] = vf; + devfn += vf_stride; + } + + return true; } void pcie_sriov_pf_exit(PCIDevice *dev) { - unregister_vfs(dev); - g_free((char *)dev->exp.sriov_pf.vfname); - dev->exp.sriov_pf.vfname = NULL; + uint8_t *cfg = dev->config + dev->exp.sriov_cap; + + unparent_vfs(dev, pci_get_word(cfg + PCI_SRIOV_TOTAL_VF)); } void pcie_sriov_pf_init_vf_bar(PCIDevice *dev, int region_num, @@ -141,38 +172,11 @@ void pcie_sriov_vf_register_bar(PCIDevice *dev, int region_num, } } -static PCIDevice *register_vf(PCIDevice *pf, int devfn, const char *name, - uint16_t vf_num) -{ - PCIDevice *dev = pci_new(devfn, name); - dev->exp.sriov_vf.pf = pf; - dev->exp.sriov_vf.vf_number = vf_num; - PCIBus *bus = pci_get_bus(pf); - Error *local_err = NULL; - - qdev_realize(&dev->qdev, &bus->qbus, &local_err); - if (local_err) { - error_report_err(local_err); - return NULL; - } - - /* set vid/did according to sr/iov spec - they are not used */ - pci_config_set_vendor_id(dev->config, 0xffff); - pci_config_set_device_id(dev->config, 0xffff); - - return dev; -} - static void register_vfs(PCIDevice *dev) { uint16_t num_vfs; uint16_t i; uint16_t sriov_cap = dev->exp.sriov_cap; - uint16_t vf_offset = - pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_OFFSET); - uint16_t vf_stride = - pci_get_word(dev->config + sriov_cap + PCI_SRIOV_VF_STRIDE); - int32_t devfn = dev->devfn + vf_offset; assert(sriov_cap > 0); num_vfs = pci_get_word(dev->config + sriov_cap + PCI_SRIOV_NUM_VF); @@ -180,18 +184,10 @@ static void register_vfs(PCIDevice *dev) return; } - dev->exp.sriov_pf.vf = g_new(PCIDevice *, num_vfs); - trace_sriov_register_vfs(dev->name, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), num_vfs); for (i = 0; i < num_vfs; i++) { - dev->exp.sriov_pf.vf[i] = register_vf(dev, devfn, - dev->exp.sriov_pf.vfname, i); - if (!dev->exp.sriov_pf.vf[i]) { - num_vfs = i; - break; - } - devfn += vf_stride; + pci_set_enabled(dev->exp.sriov_pf.vf[i], true); } dev->exp.sriov_pf.num_vfs = num_vfs; } @@ -204,12 +200,8 @@ static void unregister_vfs(PCIDevice *dev) trace_sriov_unregister_vfs(dev->name, PCI_SLOT(dev->devfn), PCI_FUNC(dev->devfn), num_vfs); for (i = 0; i < num_vfs; i++) { - PCIDevice *vf = dev->exp.sriov_pf.vf[i]; - object_unparent(OBJECT(vf)); - object_unref(OBJECT(vf)); + pci_set_enabled(dev->exp.sriov_pf.vf[i], false); } - g_free(dev->exp.sriov_pf.vf); - dev->exp.sriov_pf.vf = NULL; dev->exp.sriov_pf.num_vfs = 0; } @@ -231,14 +223,10 @@ void pcie_sriov_config_write(PCIDevice *dev, uint32_t address, PCI_FUNC(dev->devfn), off, val, len); if (range_covers_byte(off, len, PCI_SRIOV_CTRL)) { - if (dev->exp.sriov_pf.num_vfs) { - if (!(val & PCI_SRIOV_CTRL_VFE)) { - unregister_vfs(dev); - } + if (val & PCI_SRIOV_CTRL_VFE) { + register_vfs(dev); } else { - if (val & PCI_SRIOV_CTRL_VFE) { - register_vfs(dev); - } + unregister_vfs(dev); } } }