Message ID | 20140109153513.21446.31778.stgit@bling.home (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > Per the SR-IOV spec rev 1.1: > > 3.4.1.9 Header Type (Offset 0Eh) > > "... For VFs, this register must be RO Zero." > > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb > NIC. When they do it makes us handle ACS testing and therefore IOMMU > groups as if they were actual multifunction devices and require ACS > capabilities to make sure there's no peer-to-peer between functions. > VFs are never traditional multifunction devices, so simply clear this > bit before we get any further into setup. This seems reasonable. Do you have "lspci -vvxxxx" output for this device? I'd like to save it for future reference. > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > drivers/pci/iov.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index 1fe2d6f..e2fbb67 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -84,6 +84,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) > virtfn->dev.parent = dev->dev.parent; > virtfn->physfn = pci_dev_get(dev); > virtfn->is_virtfn = 1; > + virtfn->multifunction = 0; > > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > res = dev->resource + PCI_IOV_RESOURCES + i; > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-01-09 at 11:08 -0700, Bjorn Helgaas wrote: > On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > Per the SR-IOV spec rev 1.1: > > > > 3.4.1.9 Header Type (Offset 0Eh) > > > > "... For VFs, this register must be RO Zero." > > > > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb > > NIC. When they do it makes us handle ACS testing and therefore IOMMU > > groups as if they were actual multifunction devices and require ACS > > capabilities to make sure there's no peer-to-peer between functions. > > VFs are never traditional multifunction devices, so simply clear this > > bit before we get any further into setup. > > This seems reasonable. Do you have "lspci -vvxxxx" output for this > device? I'd like to save it for future reference. Sure, here's a VF: 09:04.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) Subsystem: Emulex Corporation Device e722 Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Region 0: [virtual] Memory at df980000 (64-bit, non-prefetchable) [size=16K] Capabilities: [48] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=0 offset=00002000 PBA: BAR=0 offset=00003000 Capabilities: [c0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <16us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+ RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset- MaxPayload 256 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Latency L0 <2us, L1 <16us ClockPM- Surprise- LLActRep- BwNot- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b8] Vital Product Data Product Name: O?L\x00O102-NM, 10GBIT/S ETHERNET ADAPTER, PCIE, 2P, 10GBASE-SR OPTICAL, NIC VF Read-only fields: [PN] Part number: OCe11102-NM [SN] Serial number: BK20210736 [V0] Vendor specific: BK20210736 [VB] Vendor specific: PW=12W; PCIe 2.0 x8 5GT/s [V1] Vendor specific: Emulex OneConnect OCe11102-NM 2NM 2t PCIe 10GbE CNA [V2] Vendor specific: OCe11102-NM [V4] Vendor specific: 0 [RV] Reserved: checksum bad, 29 byte(s) reserved End Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI) ARICap: MFVC- ACS-, Next Function: 0 ARICtl: MFVC- ACS-, Function Group: 0 Capabilities: [12c v1] Transaction Processing Hints No steering table available Kernel driver in use: vfio-pci 00: ff ff ff ff 04 00 10 00 01 00 00 02 00 00 80 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 df 10 22 e7 30: 00 00 00 00 48 00 00 00 00 00 00 00 00 00 00 00 40: 00 00 00 00 00 00 00 00 11 c0 01 80 00 20 00 00 50: 00 30 00 00 00 26 c0 90 41 01 01 c0 41 06 00 ff 60: fc ff f4 00 00 00 00 00 0f 4c a7 09 bf 05 ff ff 70: e0 e0 ff ff ff ff 08 7c 00 00 00 00 00 c0 00 00 80: 11 10 00 00 11 10 00 00 00 00 00 00 00 00 00 00 90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a0: 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 b0: ff ff ff ff ff ff ff ff 03 00 fc 80 00 00 00 78 c0: 10 b8 02 00 02 89 64 10 2e 20 00 00 82 5c 42 00 d0: 40 00 82 10 00 00 00 00 00 00 00 00 00 00 00 00 e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f0: 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > drivers/pci/iov.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > > index 1fe2d6f..e2fbb67 100644 > > --- a/drivers/pci/iov.c > > +++ b/drivers/pci/iov.c > > @@ -84,6 +84,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) > > virtfn->dev.parent = dev->dev.parent; > > virtfn->physfn = pci_dev_get(dev); > > virtfn->is_virtfn = 1; > > + virtfn->multifunction = 0; > > > > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > > res = dev->resource + PCI_IOV_RESOURCES + i; > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 9, 2014 at 11:25 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Thu, 2014-01-09 at 11:08 -0700, Bjorn Helgaas wrote: >> On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > Per the SR-IOV spec rev 1.1: >> > >> > 3.4.1.9 Header Type (Offset 0Eh) >> > >> > "... For VFs, this register must be RO Zero." >> > >> > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb >> > NIC. When they do it makes us handle ACS testing and therefore IOMMU >> > groups as if they were actual multifunction devices and require ACS >> > capabilities to make sure there's no peer-to-peer between functions. >> > VFs are never traditional multifunction devices, so simply clear this >> > bit before we get any further into setup. >> >> This seems reasonable. Do you have "lspci -vvxxxx" output for this >> device? I'd like to save it for future reference. > > Sure, here's a VF: > > 09:04.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) > Subsystem: Emulex Corporation Device e722 Thanks! I put this in https://bugzilla.kernel.org/show_bug.cgi?id=68431, and I'll add a reference to the changelog. But I wonder if we can make this slightly more generic by doing something like this in pci_setup_device(): dev->multifunction = (PCI_FUNC(dev->devfn) == 0) && (hdr_type & 0x80); That's basically what lspci does in pci_generic_scan_bus(), and section 3.2.2.3.4 of the PCI 3.0 spec sort of implies that we should only look at the bit 7 of the header type for function 0: If a single function device is detected (i.e., bit 7 in the Header Type register of function 0 is 0), no more functions for that Device Number will be checked. If a multi-function device is detected (i.e., bit 7 in the Header Type register of function 0 is 1), then all remaining Function Numbers will be checked. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-01-09 at 14:39 -0700, Bjorn Helgaas wrote: > On Thu, Jan 9, 2014 at 11:25 AM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > On Thu, 2014-01-09 at 11:08 -0700, Bjorn Helgaas wrote: > >> On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson > >> <alex.williamson@redhat.com> wrote: > >> > Per the SR-IOV spec rev 1.1: > >> > > >> > 3.4.1.9 Header Type (Offset 0Eh) > >> > > >> > "... For VFs, this register must be RO Zero." > >> > > >> > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb > >> > NIC. When they do it makes us handle ACS testing and therefore IOMMU > >> > groups as if they were actual multifunction devices and require ACS > >> > capabilities to make sure there's no peer-to-peer between functions. > >> > VFs are never traditional multifunction devices, so simply clear this > >> > bit before we get any further into setup. > >> > >> This seems reasonable. Do you have "lspci -vvxxxx" output for this > >> device? I'd like to save it for future reference. > > > > Sure, here's a VF: > > > > 09:04.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) > > Subsystem: Emulex Corporation Device e722 > > Thanks! I put this in > https://bugzilla.kernel.org/show_bug.cgi?id=68431, and I'll add a > reference to the changelog. > > But I wonder if we can make this slightly more generic by doing > something like this in pci_setup_device(): > > dev->multifunction = (PCI_FUNC(dev->devfn) == 0) && (hdr_type & 0x80); > > That's basically what lspci does in pci_generic_scan_bus(), and > section 3.2.2.3.4 of the PCI 3.0 spec sort of implies that we should > only look at the bit 7 of the header type for function 0: > > If a single function device is detected (i.e., bit 7 in the Header > Type register of function 0 is 0), no more functions for that > Device Number will be checked. If a multi-function device is > detected (i.e., bit 7 in the Header Type register of function 0 > is 1), then all remaining Function Numbers will be checked. We could do that and rely only on pci_scan_slot() to set multifunction=1 for the other functions, but that doesn't completely solve this problem. VFs can occupy function zero and the example device would still set multifunction with that test. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[+to Nishank] On Thu, Jan 9, 2014 at 2:58 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Thu, 2014-01-09 at 14:39 -0700, Bjorn Helgaas wrote: >> On Thu, Jan 9, 2014 at 11:25 AM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > On Thu, 2014-01-09 at 11:08 -0700, Bjorn Helgaas wrote: >> >> On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson >> >> <alex.williamson@redhat.com> wrote: >> >> > Per the SR-IOV spec rev 1.1: >> >> > >> >> > 3.4.1.9 Header Type (Offset 0Eh) >> >> > >> >> > "... For VFs, this register must be RO Zero." >> >> > >> >> > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb >> >> > NIC. When they do it makes us handle ACS testing and therefore IOMMU >> >> > groups as if they were actual multifunction devices and require ACS >> >> > capabilities to make sure there's no peer-to-peer between functions. >> >> > VFs are never traditional multifunction devices, so simply clear this >> >> > bit before we get any further into setup. >> >> >> >> This seems reasonable. Do you have "lspci -vvxxxx" output for this >> >> device? I'd like to save it for future reference. >> > >> > Sure, here's a VF: >> > >> > 09:04.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) >> > Subsystem: Emulex Corporation Device e722 >> >> Thanks! I put this in >> https://bugzilla.kernel.org/show_bug.cgi?id=68431, and I'll add a >> reference to the changelog. >> >> But I wonder if we can make this slightly more generic by doing >> something like this in pci_setup_device(): >> >> dev->multifunction = (PCI_FUNC(dev->devfn) == 0) && (hdr_type & 0x80); >> >> That's basically what lspci does in pci_generic_scan_bus(), and >> section 3.2.2.3.4 of the PCI 3.0 spec sort of implies that we should >> only look at the bit 7 of the header type for function 0: >> >> If a single function device is detected (i.e., bit 7 in the Header >> Type register of function 0 is 0), no more functions for that >> Device Number will be checked. If a multi-function device is >> detected (i.e., bit 7 in the Header Type register of function 0 >> is 1), then all remaining Function Numbers will be checked. > > We could do that and rely only on pci_scan_slot() to set multifunction=1 > for the other functions, but that doesn't completely solve this problem. > VFs can occupy function zero and the example device would still set > multifunction with that test. Thanks, Duh, it would help if I actually paid attention to your lspci output... The reason I'm thinking about this is that virtfn_add() is only used when we enable SR-IOV. If we clear dev->multifunction there, we only end up with the correct value if we start with SR-IOV disabled, and then enable it. If SR-IOV were enabled by the firmware before Linux boots, we wouldn't go through the virtfn_add() path, and dev->multifunction might still be wrong. I'm pretty sure Nishank said there were Cisco boxes that enable SR-IOV in the firmware, but I don't know how that works. It looks like we would disable SR-IOV during enumeration in the path below: pci_scan_slot pci_scan_single_device pci_device_add pci_init_capabilities pci_iov_init sriov_init pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl) if (ctrl & PCI_SRIOV_CTRL_VFE) pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0) From that path, it *looks* like it doesn't really matter whether SR-IOV is enabled at handoff, because we disable it anyway. So I'm not sure if I misunderstood Nishank or what. I think it would be cool if we could enumerate previously-enabled VFs, but maybe there are other issues that would make that impossible. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2014-01-09 at 16:20 -0700, Bjorn Helgaas wrote: > [+to Nishank] > > On Thu, Jan 9, 2014 at 2:58 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > On Thu, 2014-01-09 at 14:39 -0700, Bjorn Helgaas wrote: > >> On Thu, Jan 9, 2014 at 11:25 AM, Alex Williamson > >> <alex.williamson@redhat.com> wrote: > >> > On Thu, 2014-01-09 at 11:08 -0700, Bjorn Helgaas wrote: > >> >> On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson > >> >> <alex.williamson@redhat.com> wrote: > >> >> > Per the SR-IOV spec rev 1.1: > >> >> > > >> >> > 3.4.1.9 Header Type (Offset 0Eh) > >> >> > > >> >> > "... For VFs, this register must be RO Zero." > >> >> > > >> >> > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb > >> >> > NIC. When they do it makes us handle ACS testing and therefore IOMMU > >> >> > groups as if they were actual multifunction devices and require ACS > >> >> > capabilities to make sure there's no peer-to-peer between functions. > >> >> > VFs are never traditional multifunction devices, so simply clear this > >> >> > bit before we get any further into setup. > >> >> > >> >> This seems reasonable. Do you have "lspci -vvxxxx" output for this > >> >> device? I'd like to save it for future reference. > >> > > >> > Sure, here's a VF: > >> > > >> > 09:04.0 Ethernet controller: Emulex Corporation OneConnect 10Gb NIC (be3) (rev 01) > >> > Subsystem: Emulex Corporation Device e722 > >> > >> Thanks! I put this in > >> https://bugzilla.kernel.org/show_bug.cgi?id=68431, and I'll add a > >> reference to the changelog. > >> > >> But I wonder if we can make this slightly more generic by doing > >> something like this in pci_setup_device(): > >> > >> dev->multifunction = (PCI_FUNC(dev->devfn) == 0) && (hdr_type & 0x80); > >> > >> That's basically what lspci does in pci_generic_scan_bus(), and > >> section 3.2.2.3.4 of the PCI 3.0 spec sort of implies that we should > >> only look at the bit 7 of the header type for function 0: > >> > >> If a single function device is detected (i.e., bit 7 in the Header > >> Type register of function 0 is 0), no more functions for that > >> Device Number will be checked. If a multi-function device is > >> detected (i.e., bit 7 in the Header Type register of function 0 > >> is 1), then all remaining Function Numbers will be checked. > > > > We could do that and rely only on pci_scan_slot() to set multifunction=1 > > for the other functions, but that doesn't completely solve this problem. > > VFs can occupy function zero and the example device would still set > > multifunction with that test. Thanks, > > Duh, it would help if I actually paid attention to your lspci output... > > The reason I'm thinking about this is that virtfn_add() is only used > when we enable SR-IOV. If we clear dev->multifunction there, we only > end up with the correct value if we start with SR-IOV disabled, and > then enable it. > > If SR-IOV were enabled by the firmware before Linux boots, we wouldn't > go through the virtfn_add() path, and dev->multifunction might still > be wrong. > > I'm pretty sure Nishank said there were Cisco boxes that enable SR-IOV > in the firmware, but I don't know how that works. It looks like we > would disable SR-IOV during enumeration in the path below: > > pci_scan_slot > pci_scan_single_device > pci_device_add > pci_init_capabilities > pci_iov_init > sriov_init > pci_read_config_word(dev, pos + PCI_SRIOV_CTRL, &ctrl) > if (ctrl & PCI_SRIOV_CTRL_VFE) > pci_write_config_word(dev, pos + PCI_SRIOV_CTRL, 0) > > From that path, it *looks* like it doesn't really matter whether > SR-IOV is enabled at handoff, because we disable it anyway. > > So I'm not sure if I misunderstood Nishank or what. I think it would > be cool if we could enumerate previously-enabled VFs, but maybe there > are other issues that would make that impossible. VFs don't have vendor/device IDs, that's provided by the PF SR-IOV capability. We'd need to go reverse engineer which PF they came from to set that up, so it doesn't really seem worthwhile. I have seen devices with various modes of operation, SR-IOV or multifunction. Depending on how the device firmware is configured they can pretend to be something that looks more like a traditional multifunction device or generate VFs. Those typically exist to work around system firmware that doesn't support SR-IOV and doesn't leave properly sized apertures. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 9, 2014 at 8:36 AM, Alex Williamson <alex.williamson@redhat.com> wrote: > Per the SR-IOV spec rev 1.1: > > 3.4.1.9 Header Type (Offset 0Eh) > > "... For VFs, this register must be RO Zero." > > Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb > NIC. When they do it makes us handle ACS testing and therefore IOMMU > groups as if they were actual multifunction devices and require ACS > capabilities to make sure there's no peer-to-peer between functions. > VFs are never traditional multifunction devices, so simply clear this > bit before we get any further into setup. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Applied to pci/misc for v3.14, thanks! Bjorn > --- > drivers/pci/iov.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c > index 1fe2d6f..e2fbb67 100644 > --- a/drivers/pci/iov.c > +++ b/drivers/pci/iov.c > @@ -84,6 +84,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) > virtfn->dev.parent = dev->dev.parent; > virtfn->physfn = pci_dev_get(dev); > virtfn->is_virtfn = 1; > + virtfn->multifunction = 0; > > for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { > res = dev->resource + PCI_IOV_RESOURCES + i; > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index 1fe2d6f..e2fbb67 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -84,6 +84,7 @@ static int virtfn_add(struct pci_dev *dev, int id, int reset) virtfn->dev.parent = dev->dev.parent; virtfn->physfn = pci_dev_get(dev); virtfn->is_virtfn = 1; + virtfn->multifunction = 0; for (i = 0; i < PCI_SRIOV_NUM_BARS; i++) { res = dev->resource + PCI_IOV_RESOURCES + i;
Per the SR-IOV spec rev 1.1: 3.4.1.9 Header Type (Offset 0Eh) "... For VFs, this register must be RO Zero." Unfortunately some devices get this wrong, ex. Emulex OneConnect 10Gb NIC. When they do it makes us handle ACS testing and therefore IOMMU groups as if they were actual multifunction devices and require ACS capabilities to make sure there's no peer-to-peer between functions. VFs are never traditional multifunction devices, so simply clear this bit before we get any further into setup. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- drivers/pci/iov.c | 1 + 1 file changed, 1 insertion(+) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html