Message ID | 20120804181445.6598.6505.stgit@bling.home (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On 8/4/12 12:19 PM, Alex Williamson wrote: > It's possible to have buses without an associated bridge > (bus->self == NULL). SR-IOV can generate such buses. When > we find these, skip to the parent bus to look for the next > ACS test. > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > > David Ahern reported an oops from iommu drivers passing NULL into > this function for the same mistake. Harden this function against > assuming bus->self is valid as well. David, please include this > patch as well as the iommu patches in your testing. Tested-by: David Ahern <dsahern@gmail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > It's possible to have buses without an associated bridge > (bus->self == NULL). SR-IOV can generate such buses. When > we find these, skip to the parent bus to look for the next > ACS test. To make sure I understand the problem here, I think you're referring to the situation where an SR-IOV device can span several bus numbers, e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in the SR-IOV 1.1 spec, sec. 2.1.2. It says "All PFs must be located on the Device's captured Bus Number" -- I think that means every PF will be directly on a bridge's secondary bus and hence will have a valid dev->bus->self pointer. However, VFs need not be on the same bus number. If a VF is on (captured Bus Number plus 1), I think we allocate a new struct pci_bus for it, but there's no P2P bridge that leads to that bus, so the bus->self pointer is probably NULL. This makes me quite nervous, because I bet there are many places that assume every non-root bus has a valid bus->self pointer -- I know I certainly had that assumption. I looked at callers of pci_is_root_bus(), and at first glance, it seems like iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), pci_get_interrupt_pin(), pci_common_swizzle(), pci_find_upstream_pcie_bridge(), and pci_bus_release_bridge_resources() all might have similar problems. > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > --- > > David Ahern reported an oops from iommu drivers passing NULL into > this function for the same mistake. Harden this function against > assuming bus->self is valid as well. David, please include this > patch as well as the iommu patches in your testing. > > drivers/pci/pci.c | 22 +++++++++++++++++----- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index f3ea977..e11a49c 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -2486,18 +2486,30 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) > bool pci_acs_path_enabled(struct pci_dev *start, > struct pci_dev *end, u16 acs_flags) > { > - struct pci_dev *pdev, *parent = start; > + struct pci_dev *pdev = start; > + struct pci_bus *bus; > > do { > - pdev = parent; > - > if (!pci_acs_enabled(pdev, acs_flags)) > return false; > > - if (pci_is_root_bus(pdev->bus)) > + bus = pdev->bus; > + > + if (pci_is_root_bus(bus)) > return (end == NULL); > > - parent = pdev->bus->self; > + /* > + * Skip buses without an associated bridge. In this > + * case move to the parent and continue. > + */ > + while (!bus->self) { > + if (!pci_is_root_bus(bus)) > + bus = bus->parent; > + else > + return (end == NULL); > + } > + > + pdev = bus->self; > } while (pdev != end); > > return true; > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: > On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: > > It's possible to have buses without an associated bridge > > (bus->self == NULL). SR-IOV can generate such buses. When > > we find these, skip to the parent bus to look for the next > > ACS test. > > To make sure I understand the problem here, I think you're referring > to the situation where an SR-IOV device can span several bus numbers, > e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in > the SR-IOV 1.1 spec, sec. 2.1.2. > > It says "All PFs must be located on the Device's captured Bus Number" > -- I think that means every PF will be directly on a bridge's > secondary bus and hence will have a valid dev->bus->self pointer. > > However, VFs need not be on the same bus number. If a VF is on > (captured Bus Number plus 1), I think we allocate a new struct pci_bus > for it, but there's no P2P bridge that leads to that bus, so the > bus->self pointer is probably NULL. Yes, exactly. virtfn_add_bus() is where we're creating this new bus. > This makes me quite nervous, because I bet there are many places that > assume every non-root bus has a valid bus->self pointer -- I know I > certainly had that assumption. > > I looked at callers of pci_is_root_bus(), and at first glance, it seems like > iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), These 3 are handled by this patch, plus the intel and amd iommu patches I sent. > pci_get_interrupt_pin(), pci_common_swizzle(), If sr-iov is the only source of these virtual buses, these are probably ok since VFs don't support INTx. > pci_find_upstream_pcie_bridge(), and Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if sr-iov only (and assuming VFs properly report PCIe capability), we shouldn't stumble on it. > pci_bus_release_bridge_resources() all might have similar problems. This one might deserve further investigation. Thanks, Alex > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> > > --- > > > > David Ahern reported an oops from iommu drivers passing NULL into > > this function for the same mistake. Harden this function against > > assuming bus->self is valid as well. David, please include this > > patch as well as the iommu patches in your testing. > > > > drivers/pci/pci.c | 22 +++++++++++++++++----- > > 1 file changed, 17 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index f3ea977..e11a49c 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -2486,18 +2486,30 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) > > bool pci_acs_path_enabled(struct pci_dev *start, > > struct pci_dev *end, u16 acs_flags) > > { > > - struct pci_dev *pdev, *parent = start; > > + struct pci_dev *pdev = start; > > + struct pci_bus *bus; > > > > do { > > - pdev = parent; > > - > > if (!pci_acs_enabled(pdev, acs_flags)) > > return false; > > > > - if (pci_is_root_bus(pdev->bus)) > > + bus = pdev->bus; > > + > > + if (pci_is_root_bus(bus)) > > return (end == NULL); > > > > - parent = pdev->bus->self; > > + /* > > + * Skip buses without an associated bridge. In this > > + * case move to the parent and continue. > > + */ > > + while (!bus->self) { > > + if (!pci_is_root_bus(bus)) > > + bus = bus->parent; > > + else > > + return (end == NULL); > > + } > > + > > + pdev = bus->self; > > } while (pdev != end); > > > > return true; > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson <alex.williamson@redhat.com> wrote: > On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: >> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >> > It's possible to have buses without an associated bridge >> > (bus->self == NULL). SR-IOV can generate such buses. When >> > we find these, skip to the parent bus to look for the next >> > ACS test. >> >> To make sure I understand the problem here, I think you're referring >> to the situation where an SR-IOV device can span several bus numbers, >> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in >> the SR-IOV 1.1 spec, sec. 2.1.2. >> >> It says "All PFs must be located on the Device's captured Bus Number" >> -- I think that means every PF will be directly on a bridge's >> secondary bus and hence will have a valid dev->bus->self pointer. >> >> However, VFs need not be on the same bus number. If a VF is on >> (captured Bus Number plus 1), I think we allocate a new struct pci_bus >> for it, but there's no P2P bridge that leads to that bus, so the >> bus->self pointer is probably NULL. > > Yes, exactly. virtfn_add_bus() is where we're creating this new bus. > >> This makes me quite nervous, because I bet there are many places that >> assume every non-root bus has a valid bus->self pointer -- I know I >> certainly had that assumption. >> >> I looked at callers of pci_is_root_bus(), and at first glance, it seems like >> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), > > > These 3 are handled by this patch, plus the intel and amd iommu patches > I sent. > >> pci_get_interrupt_pin(), pci_common_swizzle(), > > If sr-iov is the only source of these virtual buses, these are probably > ok since VFs don't support INTx. > >> pci_find_upstream_pcie_bridge(), and > > Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if > sr-iov only (and assuming VFs properly report PCIe capability), we > shouldn't stumble on it. > >> pci_bus_release_bridge_resources() all might have similar problems. > > This one might deserve further investigation. Thanks, We can fix all these places piecemeal, but that doesn't feel like a very satisfying solution. It makes it much harder to know that each place is correct, and this oddity of a bus with no upstream bridge is still lying around, waiting to bite us again later. What other possible ways of fixing this do we have? Could we set bus->self (multiple buses would then point to the same bridge, and I don't know if that would break something)? Add something like a pci_upstream_p2p_bridge() interface that would encapsulate traversing the bus->parent and bus->self links? Since these fake VF buses don't have a bridge that points to them, I think the only place we keep a pointer to them is in the parent bus's "children" list (updated in pci_add_new_bus()). And now I'm confused about when we should use bus->children and when we should use bus->devices and why we should have both. Does pci_walk_bus() work correctly with these VFs on fake buses? It doesn't use "children", so I can't see how it would ever find them. Aren't you sorry you opened this can of worms? :) >> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com> >> > --- >> > >> > David Ahern reported an oops from iommu drivers passing NULL into >> > this function for the same mistake. Harden this function against >> > assuming bus->self is valid as well. David, please include this >> > patch as well as the iommu patches in your testing. >> > >> > drivers/pci/pci.c | 22 +++++++++++++++++----- >> > 1 file changed, 17 insertions(+), 5 deletions(-) >> > >> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> > index f3ea977..e11a49c 100644 >> > --- a/drivers/pci/pci.c >> > +++ b/drivers/pci/pci.c >> > @@ -2486,18 +2486,30 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) >> > bool pci_acs_path_enabled(struct pci_dev *start, >> > struct pci_dev *end, u16 acs_flags) >> > { >> > - struct pci_dev *pdev, *parent = start; >> > + struct pci_dev *pdev = start; >> > + struct pci_bus *bus; >> > >> > do { >> > - pdev = parent; >> > - >> > if (!pci_acs_enabled(pdev, acs_flags)) >> > return false; >> > >> > - if (pci_is_root_bus(pdev->bus)) >> > + bus = pdev->bus; >> > + >> > + if (pci_is_root_bus(bus)) >> > return (end == NULL); >> > >> > - parent = pdev->bus->self; >> > + /* >> > + * Skip buses without an associated bridge. In this >> > + * case move to the parent and continue. >> > + */ >> > + while (!bus->self) { >> > + if (!pci_is_root_bus(bus)) >> > + bus = bus->parent; >> > + else >> > + return (end == NULL); >> > + } >> > + >> > + pdev = bus->self; >> > } while (pdev != end); >> > >> > return true; >> > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/06/2012 04:47 PM, Bjorn Helgaas wrote: > On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson > <alex.williamson@redhat.com> wrote: >> On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: >>> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson >>> <alex.williamson@redhat.com> wrote: >>>> It's possible to have buses without an associated bridge >>>> (bus->self == NULL). SR-IOV can generate such buses. When >>>> we find these, skip to the parent bus to look for the next >>>> ACS test. >>> >>> To make sure I understand the problem here, I think you're referring >>> to the situation where an SR-IOV device can span several bus numbers, >>> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in >>> the SR-IOV 1.1 spec, sec. 2.1.2. >>> >>> It says "All PFs must be located on the Device's captured Bus Number" >>> -- I think that means every PF will be directly on a bridge's >>> secondary bus and hence will have a valid dev->bus->self pointer. >>> >>> However, VFs need not be on the same bus number. If a VF is on >>> (captured Bus Number plus 1), I think we allocate a new struct pci_bus >>> for it, but there's no P2P bridge that leads to that bus, so the >>> bus->self pointer is probably NULL. >> >> Yes, exactly. virtfn_add_bus() is where we're creating this new bus. >> >>> This makes me quite nervous, because I bet there are many places that >>> assume every non-root bus has a valid bus->self pointer -- I know I >>> certainly had that assumption. >>> >>> I looked at callers of pci_is_root_bus(), and at first glance, it seems like >>> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), >> >> >> These 3 are handled by this patch, plus the intel and amd iommu patches >> I sent. >> >>> pci_get_interrupt_pin(), pci_common_swizzle(), >> >> If sr-iov is the only source of these virtual buses, these are probably >> ok since VFs don't support INTx. >> >>> pci_find_upstream_pcie_bridge(), and >> >> Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if >> sr-iov only (and assuming VFs properly report PCIe capability), we >> shouldn't stumble on it. >> >>> pci_bus_release_bridge_resources() all might have similar problems. >> >> This one might deserve further investigation. Thanks, > > We can fix all these places piecemeal, but that doesn't feel like a > very satisfying solution. It makes it much harder to know that each > place is correct, and this oddity of a bus with no upstream bridge is > still lying around, waiting to bite us again later. > > What other possible ways of fixing this do we have? Could we set > bus->self (multiple buses would then point to the same bridge, and I > don't know if that would break something)? Add something like a > pci_upstream_p2p_bridge() interface that would encapsulate traversing ^^^ and this name will reduce the confusion? :) > the bus->parent and bus->self links? > > Since these fake VF buses don't have a bridge that points to them, I Well, they aren't fake busses, just ARI-identifiers, which translate the B:D.F/8:5.3 format to simply a 16-bit i.d. So, VF devices should be attached to same bus->devices list as it's PF. pci_dev->bus should be same bus ptr as PF's pci_dev as well, since the VF uses all that's busses resources, support functions (cfg, dma-ops, etc.) as well. Searching the driver/pci area, support of functions like AER want the bus struct that's receiving/handling the PCIe error, associated (hw) port, etc., so another reason the VF's pci-dev bus ptr should be the same as the PF's. Logically, ARI-based VFs with a 'bus-num' value != PF bus-num value make a point-to-point PCIe link look more like a parallel-bus with a different identifier parsing -- diff. interpretation of a 16-bit field. > think the only place we keep a pointer to them is in the parent bus's > "children" list (updated in pci_add_new_bus()). And now I'm confused > about when we should use bus->children and when we should use > bus->devices and why we should have both. well, children are child busses; devices are all devices, bus-bridge & endpt devices. As for use.... seems like children should be traversed when doing bus ops. > > Does pci_walk_bus() work correctly with these VFs on fake buses? It > doesn't use "children", so I can't see how it would ever find them. > as I read pci_walk_bus(), it won't work for VFs attached to a bus-num-id that doesn't match the PF's bus-num. sure glad the VFs don't use/need pci_walk_bus()! :o ! Seems like a bug in that algorithm.... > Aren't you sorry you opened this can of worms? :) > yeah, aw has a tendency to step in it (worms would be too clean an analogy for Alex!). >>>> Signed-off-by: Alex Williamson<alex.williamson@redhat.com> >>>> --- >>>> >>>> David Ahern reported an oops from iommu drivers passing NULL into >>>> this function for the same mistake. Harden this function against >>>> assuming bus->self is valid as well. David, please include this >>>> patch as well as the iommu patches in your testing. >>>> >>>> drivers/pci/pci.c | 22 +++++++++++++++++----- >>>> 1 file changed, 17 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >>>> index f3ea977..e11a49c 100644 >>>> --- a/drivers/pci/pci.c >>>> +++ b/drivers/pci/pci.c >>>> @@ -2486,18 +2486,30 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) >>>> bool pci_acs_path_enabled(struct pci_dev *start, >>>> struct pci_dev *end, u16 acs_flags) >>>> { >>>> - struct pci_dev *pdev, *parent = start; >>>> + struct pci_dev *pdev = start; >>>> + struct pci_bus *bus; >>>> >>>> do { >>>> - pdev = parent; >>>> - >>>> if (!pci_acs_enabled(pdev, acs_flags)) >>>> return false; >>>> >>>> - if (pci_is_root_bus(pdev->bus)) >>>> + bus = pdev->bus; >>>> + >>>> + if (pci_is_root_bus(bus)) >>>> return (end == NULL); >>>> >>>> - parent = pdev->bus->self; >>>> + /* >>>> + * Skip buses without an associated bridge. In this >>>> + * case move to the parent and continue. >>>> + */ >>>> + while (!bus->self) { >>>> + if (!pci_is_root_bus(bus)) >>>> + bus = bus->parent; >>>> + else >>>> + return (end == NULL); >>>> + } >>>> + >>>> + pdev = bus->self; >>>> } while (pdev != end); >>>> >>>> return true; >>>> >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-pci" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 7, 2012 at 2:50 PM, Don Dutile <ddutile@redhat.com> wrote: > On 08/06/2012 04:47 PM, Bjorn Helgaas wrote: >> >> On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson >> <alex.williamson@redhat.com> wrote: >>> >>> On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: >>>> >>>> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson >>>> <alex.williamson@redhat.com> wrote: >>>>> >>>>> It's possible to have buses without an associated bridge >>>>> (bus->self == NULL). SR-IOV can generate such buses. When >>>>> we find these, skip to the parent bus to look for the next >>>>> ACS test. >>>> >>>> >>>> To make sure I understand the problem here, I think you're referring >>>> to the situation where an SR-IOV device can span several bus numbers, >>>> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in >>>> the SR-IOV 1.1 spec, sec. 2.1.2. >>>> >>>> It says "All PFs must be located on the Device's captured Bus Number" >>>> -- I think that means every PF will be directly on a bridge's >>>> secondary bus and hence will have a valid dev->bus->self pointer. >>>> >>>> However, VFs need not be on the same bus number. If a VF is on >>>> (captured Bus Number plus 1), I think we allocate a new struct pci_bus >>>> for it, but there's no P2P bridge that leads to that bus, so the >>>> bus->self pointer is probably NULL. >>> >>> >>> Yes, exactly. virtfn_add_bus() is where we're creating this new bus. >>> >>>> This makes me quite nervous, because I bet there are many places that >>>> assume every non-root bus has a valid bus->self pointer -- I know I >>>> certainly had that assumption. >>>> >>>> I looked at callers of pci_is_root_bus(), and at first glance, it seems >>>> like >>>> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), >>> >>> >>> >>> These 3 are handled by this patch, plus the intel and amd iommu patches >>> I sent. >>> >>>> pci_get_interrupt_pin(), pci_common_swizzle(), >>> >>> >>> If sr-iov is the only source of these virtual buses, these are probably >>> ok since VFs don't support INTx. >>> >>>> pci_find_upstream_pcie_bridge(), and >>> >>> >>> Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if >>> sr-iov only (and assuming VFs properly report PCIe capability), we >>> shouldn't stumble on it. >>> >>>> pci_bus_release_bridge_resources() all might have similar problems. >>> >>> >>> This one might deserve further investigation. Thanks, >> >> >> We can fix all these places piecemeal, but that doesn't feel like a >> very satisfying solution. It makes it much harder to know that each >> place is correct, and this oddity of a bus with no upstream bridge is >> still lying around, waiting to bite us again later. >> >> What other possible ways of fixing this do we have? Could we set >> bus->self (multiple buses would then point to the same bridge, and I >> don't know if that would break something)? Add something like a >> pci_upstream_p2p_bridge() interface that would encapsulate traversing > > ^^^ and this name will reduce the confusion? :) I don't claim that :) I just wanted to explore other possible solutions. Changing every loop that searches the parent chain so it knows about this SR-IOV oddity doesn't seem like the ideal solution, though maybe it's the best we can do given the constraints. >> Since these fake VF buses don't have a bridge that points to them, I > > Well, they aren't fake busses, just ARI-identifiers, which translate the > B:D.F/8:5.3 > format to simply a 16-bit i.d. I think an SR-IOV device can consume multiple bus numbers even without ARI (in fact, I think ARI reduces the number of bus numbers the device requires ... e.g., a PF and 15 VFs would require two bus numbers without ARI (04:00.0 - 04:00.7 and 05:00.0 - 05:00.7) but only one bus number with ARI (04:00.0 - 04:01.7)). (I think "04:01.7" is how Linux would represent the 8-bit function number ARI gives you. You could also think of it as "04:00.0f") > So, VF devices should be attached to same bus->devices list as it's PF. I don't think it works that way today, does it? In the SR-IOV spec example in sec 2.1.2: PF 0 at 04:00.0 ARI Capable is set First VF Offset = 1, VF Stride = 1, NumVFs = 600 I think we have three separate bus->devices lists: pci_bus 04: devices list contains PF 0 and VF 0,1 through VF 0,255 pci_bus 05: devices list contains VF 0,256 - VF 0,511 pci_bus 06: devices list contains VF 0,512 - VF 0,600 > pci_dev->bus should be same bus ptr as PF's pci_dev as well, since the > VF uses all that's busses resources, support functions (cfg, dma-ops, etc.) > as well. > Searching the driver/pci area, support of functions like AER want the > bus struct that's receiving/handling the PCIe error, associated (hw) port, > etc., > so another reason the VF's pci-dev bus ptr should be the same as the PF's. Maybe every VF *should* have the same dev->bus pointer as the PF, but I don't think it does today. I think we only store the bus number in the struct pci_bus, so if we *did* give all the VFs the same dev->bus pointer and put all the VFs in the same bus->devices list, we'd have to store the bus number elsewhere, e.g., in the struct pci_dev. That might make sense, but the magnitude of a change like that makes my head hurt -- it would affect drivers, arch code, config accessors, etc. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/8/12 12:00 AM, Bjorn Helgaas wrote: > On Tue, Aug 7, 2012 at 2:50 PM, Don Dutile <ddutile@redhat.com> wrote: >> On 08/06/2012 04:47 PM, Bjorn Helgaas wrote: >>> >>> On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson >>> <alex.williamson@redhat.com> wrote: >>>> >>>> On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: >>>>> >>>>> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson >>>>> <alex.williamson@redhat.com> wrote: >>>>>> >>>>>> It's possible to have buses without an associated bridge >>>>>> (bus->self == NULL). SR-IOV can generate such buses. When >>>>>> we find these, skip to the parent bus to look for the next >>>>>> ACS test. >>>>> >>>>> >>>>> To make sure I understand the problem here, I think you're referring >>>>> to the situation where an SR-IOV device can span several bus numbers, >>>>> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in >>>>> the SR-IOV 1.1 spec, sec. 2.1.2. >>>>> >>>>> It says "All PFs must be located on the Device's captured Bus Number" >>>>> -- I think that means every PF will be directly on a bridge's >>>>> secondary bus and hence will have a valid dev->bus->self pointer. >>>>> >>>>> However, VFs need not be on the same bus number. If a VF is on >>>>> (captured Bus Number plus 1), I think we allocate a new struct pci_bus >>>>> for it, but there's no P2P bridge that leads to that bus, so the >>>>> bus->self pointer is probably NULL. >>>> >>>> >>>> Yes, exactly. virtfn_add_bus() is where we're creating this new bus. >>>> >>>>> This makes me quite nervous, because I bet there are many places that >>>>> assume every non-root bus has a valid bus->self pointer -- I know I >>>>> certainly had that assumption. >>>>> >>>>> I looked at callers of pci_is_root_bus(), and at first glance, it seems >>>>> like >>>>> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), >>>> >>>> >>>> >>>> These 3 are handled by this patch, plus the intel and amd iommu patches >>>> I sent. >>>> >>>>> pci_get_interrupt_pin(), pci_common_swizzle(), >>>> >>>> >>>> If sr-iov is the only source of these virtual buses, these are probably >>>> ok since VFs don't support INTx. >>>> >>>>> pci_find_upstream_pcie_bridge(), and >>>> >>>> >>>> Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if >>>> sr-iov only (and assuming VFs properly report PCIe capability), we >>>> shouldn't stumble on it. >>>> >>>>> pci_bus_release_bridge_resources() all might have similar problems. >>>> >>>> >>>> This one might deserve further investigation. Thanks, >>> >>> >>> We can fix all these places piecemeal, but that doesn't feel like a >>> very satisfying solution. It makes it much harder to know that each >>> place is correct, and this oddity of a bus with no upstream bridge is >>> still lying around, waiting to bite us again later. >>> >>> What other possible ways of fixing this do we have? Could we set >>> bus->self (multiple buses would then point to the same bridge, and I >>> don't know if that would break something)? Add something like a >>> pci_upstream_p2p_bridge() interface that would encapsulate traversing >> >> ^^^ and this name will reduce the confusion? :) > > I don't claim that :) I just wanted to explore other possible > solutions. Changing every loop that searches the parent chain so it > knows about this SR-IOV oddity doesn't seem like the ideal solution, > though maybe it's the best we can do given the constraints. > >>> Since these fake VF buses don't have a bridge that points to them, I >> >> Well, they aren't fake busses, just ARI-identifiers, which translate the >> B:D.F/8:5.3 >> format to simply a 16-bit i.d. > > I think an SR-IOV device can consume multiple bus numbers even without > ARI (in fact, I think ARI reduces the number of bus numbers the > device requires ... e.g., a PF and 15 VFs would require two bus > numbers without ARI (04:00.0 - 04:00.7 and 05:00.0 - 05:00.7) but only > one bus number with ARI (04:00.0 - 04:01.7)). (I think "04:01.7" is > how Linux would represent the 8-bit function number ARI gives you. > You could also think of it as "04:00.0f") > >> So, VF devices should be attached to same bus->devices list as it's PF. > > I don't think it works that way today, does it? In the SR-IOV spec > example in sec 2.1.2: > > PF 0 at 04:00.0 > ARI Capable is set > First VF Offset = 1, VF Stride = 1, NumVFs = 600 > > I think we have three separate bus->devices lists: > > pci_bus 04: devices list contains PF 0 and VF 0,1 through VF 0,255 > pci_bus 05: devices list contains VF 0,256 - VF 0,511 > pci_bus 06: devices list contains VF 0,512 - VF 0,600 > >> pci_dev->bus should be same bus ptr as PF's pci_dev as well, since the >> VF uses all that's busses resources, support functions (cfg, dma-ops, etc.) >> as well. >> Searching the driver/pci area, support of functions like AER want the >> bus struct that's receiving/handling the PCIe error, associated (hw) port, >> etc., >> so another reason the VF's pci-dev bus ptr should be the same as the PF's. > > Maybe every VF *should* have the same dev->bus pointer as the PF, but > I don't think it does today. I think we only store the bus number in > the struct pci_bus, so if we *did* give all the VFs the same dev->bus > pointer and put all the VFs in the same bus->devices list, we'd have > to store the bus number elsewhere, e.g., in the struct pci_dev. > > That might make sense, but the magnitude of a change like that makes > my head hurt -- it would affect drivers, arch code, config accessors, > etc. > Perhaps I misunderstand your point. VF's have shown up like this for quite a while (e.g., running 3.6.0-rc1): 05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01) 06:10.0 Ethernet controller: Illegal Vendor ID Device ffff (rev 01) 06:10.1 Ethernet controller: Illegal Vendor ID Device ffff (rev 01) .. 06:11.5 Ethernet controller: Illegal Vendor ID Device ffff (rev 01) 05:00.{0,1} are the PF's and the 06:* are the VF's (BTW, the 'Illegal Vendor ID' is new to 3.6; in 3.5 the VF's show as 'Intel Corporation 82576 Virtual Function' but that's a topic for a different thread I guess). David -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-08-07 at 23:00 -0700, Bjorn Helgaas wrote: > On Tue, Aug 7, 2012 at 2:50 PM, Don Dutile <ddutile@redhat.com> wrote: > > On 08/06/2012 04:47 PM, Bjorn Helgaas wrote: > >> > >> On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson > >> <alex.williamson@redhat.com> wrote: > >>> > >>> On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: > >>>> > >>>> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson > >>>> <alex.williamson@redhat.com> wrote: > >>>>> > >>>>> It's possible to have buses without an associated bridge > >>>>> (bus->self == NULL). SR-IOV can generate such buses. When > >>>>> we find these, skip to the parent bus to look for the next > >>>>> ACS test. > >>>> > >>>> > >>>> To make sure I understand the problem here, I think you're referring > >>>> to the situation where an SR-IOV device can span several bus numbers, > >>>> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in > >>>> the SR-IOV 1.1 spec, sec. 2.1.2. > >>>> > >>>> It says "All PFs must be located on the Device's captured Bus Number" > >>>> -- I think that means every PF will be directly on a bridge's > >>>> secondary bus and hence will have a valid dev->bus->self pointer. > >>>> > >>>> However, VFs need not be on the same bus number. If a VF is on > >>>> (captured Bus Number plus 1), I think we allocate a new struct pci_bus > >>>> for it, but there's no P2P bridge that leads to that bus, so the > >>>> bus->self pointer is probably NULL. > >>> > >>> > >>> Yes, exactly. virtfn_add_bus() is where we're creating this new bus. > >>> > >>>> This makes me quite nervous, because I bet there are many places that > >>>> assume every non-root bus has a valid bus->self pointer -- I know I > >>>> certainly had that assumption. > >>>> > >>>> I looked at callers of pci_is_root_bus(), and at first glance, it seems > >>>> like > >>>> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), > >>> > >>> > >>> > >>> These 3 are handled by this patch, plus the intel and amd iommu patches > >>> I sent. > >>> > >>>> pci_get_interrupt_pin(), pci_common_swizzle(), > >>> > >>> > >>> If sr-iov is the only source of these virtual buses, these are probably > >>> ok since VFs don't support INTx. > >>> > >>>> pci_find_upstream_pcie_bridge(), and > >>> > >>> > >>> Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if > >>> sr-iov only (and assuming VFs properly report PCIe capability), we > >>> shouldn't stumble on it. > >>> > >>>> pci_bus_release_bridge_resources() all might have similar problems. > >>> > >>> > >>> This one might deserve further investigation. Thanks, > >> > >> > >> We can fix all these places piecemeal, but that doesn't feel like a > >> very satisfying solution. It makes it much harder to know that each > >> place is correct, and this oddity of a bus with no upstream bridge is > >> still lying around, waiting to bite us again later. > >> > >> What other possible ways of fixing this do we have? Could we set > >> bus->self (multiple buses would then point to the same bridge, and I > >> don't know if that would break something)? Add something like a > >> pci_upstream_p2p_bridge() interface that would encapsulate traversing > > > > ^^^ and this name will reduce the confusion? :) > > I don't claim that :) I just wanted to explore other possible > solutions. Changing every loop that searches the parent chain so it > knows about this SR-IOV oddity doesn't seem like the ideal solution, > though maybe it's the best we can do given the constraints. Yep, these are the two alternatives I thought of too, set bus->self or create a helper function. The former leaves present assumptions in place, but I don't know whether we'll break something else having two buses claiming to be sourced by the same device. Maintaining the NULL self pointer almost feels like a better representation of the hardware, but requires evaluating all the current users and coming up with a helper. That's more work, but we probably want to move away from everyone manually walking pci data structures anyway. > >> Since these fake VF buses don't have a bridge that points to them, I > > > > Well, they aren't fake busses, just ARI-identifiers, which translate the > > B:D.F/8:5.3 > > format to simply a 16-bit i.d. > > I think an SR-IOV device can consume multiple bus numbers even without > ARI (in fact, I think ARI reduces the number of bus numbers the > device requires ... e.g., a PF and 15 VFs would require two bus > numbers without ARI (04:00.0 - 04:00.7 and 05:00.0 - 05:00.7) but only > one bus number with ARI (04:00.0 - 04:01.7)). (I think "04:01.7" is > how Linux would represent the 8-bit function number ARI gives you. > You could also think of it as "04:00.0f") Yep, that's what I think is happening too. I've been testing on a system with ARI, so using the same device as David, PFs at 1:00.0/1 and VFs at 1:10.0 - 1:11.5. The sr-iov capability reports VF offset: 128, stride: 2. IIRC, w/o ARI this same device reports VF offset 384 (256 + 128), which seems to match David's lspci. > > > So, VF devices should be attached to same bus->devices list as it's PF. > > I don't think it works that way today, does it? In the SR-IOV spec > example in sec 2.1.2: > > PF 0 at 04:00.0 > ARI Capable is set > First VF Offset = 1, VF Stride = 1, NumVFs = 600 > > I think we have three separate bus->devices lists: > > pci_bus 04: devices list contains PF 0 and VF 0,1 through VF 0,255 > pci_bus 05: devices list contains VF 0,256 - VF 0,511 > pci_bus 06: devices list contains VF 0,512 - VF 0,600 > > > pci_dev->bus should be same bus ptr as PF's pci_dev as well, since the > > VF uses all that's busses resources, support functions (cfg, dma-ops, etc.) > > as well. > > Searching the driver/pci area, support of functions like AER want the > > bus struct that's receiving/handling the PCIe error, associated (hw) port, > > etc., > > so another reason the VF's pci-dev bus ptr should be the same as the PF's. > > Maybe every VF *should* have the same dev->bus pointer as the PF, but > I don't think it does today. I think we only store the bus number in > the struct pci_bus, so if we *did* give all the VFs the same dev->bus > pointer and put all the VFs in the same bus->devices list, we'd have > to store the bus number elsewhere, e.g., in the struct pci_dev. > > That might make sense, but the magnitude of a change like that makes > my head hurt -- it would affect drivers, arch code, config accessors, > etc. Yep, I agree, that's a total rework of what a struct pci_bus represents. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/08/2012 11:24 AM, Alex Williamson wrote: > On Tue, 2012-08-07 at 23:00 -0700, Bjorn Helgaas wrote: >> On Tue, Aug 7, 2012 at 2:50 PM, Don Dutile<ddutile@redhat.com> wrote: >>> On 08/06/2012 04:47 PM, Bjorn Helgaas wrote: >>>> >>>> On Sun, Aug 5, 2012 at 11:55 PM, Alex Williamson >>>> <alex.williamson@redhat.com> wrote: >>>>> >>>>> On Sun, 2012-08-05 at 23:30 -0600, Bjorn Helgaas wrote: >>>>>> >>>>>> On Sat, Aug 4, 2012 at 12:19 PM, Alex Williamson >>>>>> <alex.williamson@redhat.com> wrote: >>>>>>> >>>>>>> It's possible to have buses without an associated bridge >>>>>>> (bus->self == NULL). SR-IOV can generate such buses. When >>>>>>> we find these, skip to the parent bus to look for the next >>>>>>> ACS test. >>>>>> >>>>>> >>>>>> To make sure I understand the problem here, I think you're referring >>>>>> to the situation where an SR-IOV device can span several bus numbers, >>>>>> e.g., the "VFs Spanning Multiple Bus Numbers" implementation note in >>>>>> the SR-IOV 1.1 spec, sec. 2.1.2. >>>>>> >>>>>> It says "All PFs must be located on the Device's captured Bus Number" >>>>>> -- I think that means every PF will be directly on a bridge's >>>>>> secondary bus and hence will have a valid dev->bus->self pointer. >>>>>> >>>>>> However, VFs need not be on the same bus number. If a VF is on >>>>>> (captured Bus Number plus 1), I think we allocate a new struct pci_bus >>>>>> for it, but there's no P2P bridge that leads to that bus, so the >>>>>> bus->self pointer is probably NULL. >>>>> >>>>> >>>>> Yes, exactly. virtfn_add_bus() is where we're creating this new bus. >>>>> >>>>>> This makes me quite nervous, because I bet there are many places that >>>>>> assume every non-root bus has a valid bus->self pointer -- I know I >>>>>> certainly had that assumption. >>>>>> >>>>>> I looked at callers of pci_is_root_bus(), and at first glance, it seems >>>>>> like >>>>>> iommu_init_device(), intel_iommu_add_device(), pci_acs_path_enabled(), >>>>> >>>>> >>>>> >>>>> These 3 are handled by this patch, plus the intel and amd iommu patches >>>>> I sent. >>>>> >>>>>> pci_get_interrupt_pin(), pci_common_swizzle(), >>>>> >>>>> >>>>> If sr-iov is the only source of these virtual buses, these are probably >>>>> ok since VFs don't support INTx. >>>>> >>>>>> pci_find_upstream_pcie_bridge(), and >>>>> >>>>> >>>>> Here the pci_is_root_bus() is after a pci_is_pcie() check, so again if >>>>> sr-iov only (and assuming VFs properly report PCIe capability), we >>>>> shouldn't stumble on it. >>>>> >>>>>> pci_bus_release_bridge_resources() all might have similar problems. >>>>> >>>>> >>>>> This one might deserve further investigation. Thanks, >>>> >>>> >>>> We can fix all these places piecemeal, but that doesn't feel like a >>>> very satisfying solution. It makes it much harder to know that each >>>> place is correct, and this oddity of a bus with no upstream bridge is >>>> still lying around, waiting to bite us again later. >>>> >>>> What other possible ways of fixing this do we have? Could we set >>>> bus->self (multiple buses would then point to the same bridge, and I >>>> don't know if that would break something)? Add something like a >>>> pci_upstream_p2p_bridge() interface that would encapsulate traversing >>> >>> ^^^ and this name will reduce the confusion? :) >> >> I don't claim that :) I just wanted to explore other possible >> solutions. Changing every loop that searches the parent chain so it >> knows about this SR-IOV oddity doesn't seem like the ideal solution, >> though maybe it's the best we can do given the constraints. > > Yep, these are the two alternatives I thought of too, set bus->self or > create a helper function. The former leaves present assumptions in > place, but I don't know whether we'll break something else having two > buses claiming to be sourced by the same device. Maintaining the NULL > self pointer almost feels like a better representation of the hardware, > but requires evaluating all the current users and coming up with a > helper. That's more work, but we probably want to move away from > everyone manually walking pci data structures anyway. > >>>> Since these fake VF buses don't have a bridge that points to them, I >>> >>> Well, they aren't fake busses, just ARI-identifiers, which translate the >>> B:D.F/8:5.3 >>> format to simply a 16-bit i.d. >> >> I think an SR-IOV device can consume multiple bus numbers even without >> ARI (in fact, I think ARI reduces the number of bus numbers the >> device requires ... e.g., a PF and 15 VFs would require two bus >> numbers without ARI (04:00.0 - 04:00.7 and 05:00.0 - 05:00.7) but only >> one bus number with ARI (04:00.0 - 04:01.7)). (I think "04:01.7" is >> how Linux would represent the 8-bit function number ARI gives you. >> You could also think of it as "04:00.0f") > > Yep, that's what I think is happening too. I've been testing on a > system with ARI, so using the same device as David, PFs at 1:00.0/1 and > VFs at 1:10.0 - 1:11.5. The sr-iov capability reports VF offset: 128, > stride: 2. IIRC, w/o ARI this same device reports VF offset 384 (256 + > 128), which seems to match David's lspci. > Correct, lack of ARI forces the use of more bus numbers, not less. It is 'fixed' by the iov.c code by increasing the sub_bus num register to cover the range of busses the VF stride wants to use, if those bus-num's aren't already consumed; if consumed, the VFs are inaccessible/unconfigurable. >> >>> So, VF devices should be attached to same bus->devices list as it's PF. >> >> I don't think it works that way today, does it? In the SR-IOV spec >> example in sec 2.1.2: >> >> PF 0 at 04:00.0 >> ARI Capable is set >> First VF Offset = 1, VF Stride = 1, NumVFs = 600 >> >> I think we have three separate bus->devices lists: >> >> pci_bus 04: devices list contains PF 0 and VF 0,1 through VF 0,255 >> pci_bus 05: devices list contains VF 0,256 - VF 0,511 >> pci_bus 06: devices list contains VF 0,512 - VF 0,600 >> >>> pci_dev->bus should be same bus ptr as PF's pci_dev as well, since the >>> VF uses all that's busses resources, support functions (cfg, dma-ops, etc.) >>> as well. >>> Searching the driver/pci area, support of functions like AER want the >>> bus struct that's receiving/handling the PCIe error, associated (hw) port, >>> etc., >>> so another reason the VF's pci-dev bus ptr should be the same as the PF's. >> >> Maybe every VF *should* have the same dev->bus pointer as the PF, but >> I don't think it does today. I think we only store the bus number in >> the struct pci_bus, so if we *did* give all the VFs the same dev->bus >> pointer and put all the VFs in the same bus->devices list, we'd have >> to store the bus number elsewhere, e.g., in the struct pci_dev. >> >> That might make sense, but the magnitude of a change like that makes >> my head hurt -- it would affect drivers, arch code, config accessors, >> etc. > > Yep, I agree, that's a total rework of what a struct pci_bus represents. > Thanks, > > Alex > To sum up, I believe the sw structures should reflect the hw connectivity and association. If you buy that bridge (all pun intended), then the VFs should get connected to same pci-bus as the PF b/c it physically is, and the register set for that parent bridge is the one the PF is connected to ... bus->self == NULL will cause heartburn for AERs associated with VFs .... no dev to point to in order to do AER handling ... :( Likewise, additional bus structs could be generated, but they would require ptrs in them to ensure they point to the same structures (same dev-lists, same resource structs, etc.) and/or inherit the PF fields/attributes. Changes to a pci-dev associated with a bridge will also have to check if 'peer' (VF/SRIOV) busses are associated with that bridge/pci-dev, and duplicate updates -- which begs for the same bus struct to be used. I think we have to break the identification of devices, B:D.F (VFs specifically) and the existing 1-to-1 relationship with a hw device. Helper functions to change VF B:D.F formatting to be PF-B:[potentially->-32-D]:F would be one way to do it. Changing devfn in pci-dev struct from u8 to u16 would one option; putting the entire B:D.F into pci-dev and making it u32 would probably be the cleanest (and I'm familiar with at least one OS & it's PCI developer that did just that ;-), but not for these ARI reasons), and then macros like PCI_FUNC() and PCI_SLOT() could be changed appropriately, as well as a new PCI_BUS() macro. The only serious gotcha I see in the B:D.F interpretation is how it is presented in sysfs, since a D value >32 could cause some heartburn for user apps traversing sysfs, and how those values would get copy / transposed to other interaces, e.g., libvirt scanning sysfs for VFs, and how the B:D.F is parsed and passed down to qemu when a device is assigned to a guest. The 32-bit B:D.F in pci-dev, and new helpers to present & parse based on that field are looking better to me now... As for PCI tree traversal, endpoints shouldn't be doing it; I'll bet the majority that do, are looking for the parent bus to get/print the bus-num; That leaves PCI core code for bus scanning, hot plug, ACS tree traversal, etc., which should be manageable in a change like the ones I mention in the previous paragraph. Alex: does VFIO use pci functions to do traversal, or does it implement its own functions to travers pci trees? So, as stated previously, this certainly hurts the brain cells, but not as much as I thought it would. Someday, I'll try to figure out how mr-iov-based busses come & go in the pci dev tree! :-o ! - Don -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index f3ea977..e11a49c 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2486,18 +2486,30 @@ bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags) bool pci_acs_path_enabled(struct pci_dev *start, struct pci_dev *end, u16 acs_flags) { - struct pci_dev *pdev, *parent = start; + struct pci_dev *pdev = start; + struct pci_bus *bus; do { - pdev = parent; - if (!pci_acs_enabled(pdev, acs_flags)) return false; - if (pci_is_root_bus(pdev->bus)) + bus = pdev->bus; + + if (pci_is_root_bus(bus)) return (end == NULL); - parent = pdev->bus->self; + /* + * Skip buses without an associated bridge. In this + * case move to the parent and continue. + */ + while (!bus->self) { + if (!pci_is_root_bus(bus)) + bus = bus->parent; + else + return (end == NULL); + } + + pdev = bus->self; } while (pdev != end); return true;
It's possible to have buses without an associated bridge (bus->self == NULL). SR-IOV can generate such buses. When we find these, skip to the parent bus to look for the next ACS test. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> --- David Ahern reported an oops from iommu drivers passing NULL into this function for the same mistake. Harden this function against assuming bus->self is valid as well. David, please include this patch as well as the iommu patches in your testing. drivers/pci/pci.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html