diff mbox

[RFC,1/2] PCI: Add PCI device flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS

Message ID 1455487501-28630-1-git-send-email-jchandra@broadcom.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Jayachandran C. Feb. 14, 2016, 10:05 p.m. UTC
Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
that should not be considered during DMA alias search. This is
to support hardware (in this case Broadcom Vulcan PCIe subsystem)
that has internal bridges which have either missing or wrong PCIe
capabilities.

Update the function pci_for_each_dma_alias() to skip bridges with
this flag set.

Signed-off-by: Jayachandran C <jchandra@broadcom.com>
---

This patch is an RFC, if there is a better way to do this, please
let me know.

Thanks,
JC.

 drivers/pci/search.c | 2 ++
 include/linux/pci.h  | 2 ++
 2 files changed, 4 insertions(+)

Comments

Bjorn Helgaas Feb. 15, 2016, 6:20 p.m. UTC | #1
[+cc Alex, iommu list]

On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:
> Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
> that should not be considered during DMA alias search. This is
> to support hardware (in this case Broadcom Vulcan PCIe subsystem)
> that has internal bridges which have either missing or wrong PCIe
> capabilities.

This needs more explanation, like what exactly is wrong with this
device?  A missing PCIe capability might cause other problems.

What problem does this fix?  Without these patches, do we merely add
aliases that are unnecessary?  Do we crash because something goes
wrong in the pci_pcie_type() switch because of the incorrect
capability?

> Update the function pci_for_each_dma_alias() to skip bridges with
> this flag set.
> 
> Signed-off-by: Jayachandran C <jchandra@broadcom.com>
> ---
> 
> This patch is an RFC, if there is a better way to do this, please
> let me know.
> 
> Thanks,
> JC.
> 
>  drivers/pci/search.c | 2 ++
>  include/linux/pci.h  | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> index a20ce7d..e5296aa 100644
> --- a/drivers/pci/search.c
> +++ b/drivers/pci/search.c
> @@ -55,6 +55,8 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
>  			continue;
>  
>  		tmp = bus->self;
> +		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS)
> +			continue;
>  
>  		/*
>  		 * PCIe-to-PCI/X bridges alias transactions from downstream
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 27df4a6..b4d8215 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -182,6 +182,8 @@ enum pci_dev_flags {
>  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
>  	/* Get VPD from function 0 VPD */
>  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> +	/* Bridge should be ignored for alias search  */
> +	PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS = (__force pci_dev_flags_t) (1 << 9),
>  };
>  
>  enum pci_irq_reroute_variant {
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 15, 2016, 7:39 p.m. UTC | #2
On Mon, 15 Feb 2016 12:20:23 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> [+cc Alex, iommu list]
> 
> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:
> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
> > that should not be considered during DMA alias search. This is
> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
> > that has internal bridges which have either missing or wrong PCIe
> > capabilities.  

I figured this would come at some point, the right answer is of course
to follow the PCIe spec and implement the required PCIe capability in
the hardware.

> 
> This needs more explanation, like what exactly is wrong with this
> device?  A missing PCIe capability might cause other problems.
> 
> What problem does this fix?  Without these patches, do we merely add
> aliases that are unnecessary?  Do we crash because something goes
> wrong in the pci_pcie_type() switch because of the incorrect
> capability?

The change takes the same code path as it would for a real PCIe bridge
port (downstream/upstream/root), which means they want to skip adding
this bridge as an alias of the device.  So we're adding in aliases that
don't exist, the bridge itself.

If anything I'd suggest a flag that actually tries to address the
problem rather than a symptom of the problem.  For example, maybe the
flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
take that into account.  That has some trickle through for
pci_pcie_type() and all the accessor functions, but maybe it's a
cleaner solution overall (or maybe it explodes further).  Thanks,

Alex

> > Update the function pci_for_each_dma_alias() to skip bridges with
> > this flag set.
> > 
> > Signed-off-by: Jayachandran C <jchandra@broadcom.com>
> > ---
> > 
> > This patch is an RFC, if there is a better way to do this, please
> > let me know.
> > 
> > Thanks,
> > JC.
> > 
> >  drivers/pci/search.c | 2 ++
> >  include/linux/pci.h  | 2 ++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/drivers/pci/search.c b/drivers/pci/search.c
> > index a20ce7d..e5296aa 100644
> > --- a/drivers/pci/search.c
> > +++ b/drivers/pci/search.c
> > @@ -55,6 +55,8 @@ int pci_for_each_dma_alias(struct pci_dev *pdev,
> >  			continue;
> >  
> >  		tmp = bus->self;
> > +		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS)
> > +			continue;
> >  
> >  		/*
> >  		 * PCIe-to-PCI/X bridges alias transactions from downstream
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 27df4a6..b4d8215 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -182,6 +182,8 @@ enum pci_dev_flags {
> >  	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
> >  	/* Get VPD from function 0 VPD */
> >  	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
> > +	/* Bridge should be ignored for alias search  */
> > +	PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS = (__force pci_dev_flags_t) (1 << 9),
> >  };
> >  
> >  enum pci_irq_reroute_variant {
> > -- 
> > 1.9.1
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html  

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jayachandran Chandrashekaran Nair Feb. 16, 2016, 9:08 p.m. UTC | #3
On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Mon, 15 Feb 2016 12:20:23 -0600
> Bjorn Helgaas <helgaas@kernel.org> wrote:
>
>> [+cc Alex, iommu list]
>>
>> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:
>> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
>> > that should not be considered during DMA alias search. This is
>> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
>> > that has internal bridges which have either missing or wrong PCIe
>> > capabilities.
>
> I figured this would come at some point, the right answer is of course
> to follow the PCIe spec and implement the required PCIe capability in
> the hardware.

There are  PCIe controllers on the chip that follows the spec, the issue is
how it is integrated to the northbridge (equivalent) on the chip, I have
tried to explain it below.

>> This needs more explanation, like what exactly is wrong with this
>> device?  A missing PCIe capability might cause other problems.
>>
>> What problem does this fix?  Without these patches, do we merely add
>> aliases that are unnecessary?  Do we crash because something goes
>> wrong in the pci_pcie_type() switch because of the incorrect
>> capability?

Here's how (for example) how the PCI enumeration of a 2 node Vulcan
processor will look like:


[0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
    |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
    |              .
    |              ... etc...
    |
    +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
                   +-10.a.1----[13]---13.0.0---[14]---14.0.0
                   .
                   ... etc...

The devices 0.0.x and x.a.x are glue bridges that are there to
bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
have a PCIe capability (type 8) and 0.0.x does not have any pcie
capability.

In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
driver code does dma alias walk to find the device id to use
in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
since they are type root port, but will continue on up and end
up with incorrect device id.

The flag I have added is to make the pci_for_each_dma_alias()
ignore the last 2 levels of glue/internal bridges.

> The change takes the same code path as it would for a real PCIe bridge
> port (downstream/upstream/root), which means they want to skip adding
> this bridge as an alias of the device.  So we're adding in aliases that
> don't exist, the bridge itself.
>
> If anything I'd suggest a flag that actually tries to address the
> problem rather than a symptom of the problem.  For example, maybe the
> flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
> take that into account.  That has some trickle through for
> pci_pcie_type() and all the accessor functions, but maybe it's a
> cleaner solution overall (or maybe it explodes further).  Thanks,

I didn't really want to mark the glue bridges as PCIe or have fake
PCIe capability there, the obvious simple solution was to add
the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS

Any suggestions/comments on how to do this better would be welcome.

Thanks,
JC.
[Using gmail due to IT transition, hope the ascii art makes it thru]
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 16, 2016, 10:25 p.m. UTC | #4
On Wed, 17 Feb 2016 02:38:24 +0530
Jayachandran Chandrashekaran Nair
<jayachandran.chandrashekaran@broadcom.com> wrote:

> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Mon, 15 Feb 2016 12:20:23 -0600
> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >  
> >> [+cc Alex, iommu list]
> >>
> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:  
> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
> >> > that should not be considered during DMA alias search. This is
> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
> >> > that has internal bridges which have either missing or wrong PCIe
> >> > capabilities.  
> >
> > I figured this would come at some point, the right answer is of course
> > to follow the PCIe spec and implement the required PCIe capability in
> > the hardware.  
> 
> There are  PCIe controllers on the chip that follows the spec, the issue is
> how it is integrated to the northbridge (equivalent) on the chip, I have
> tried to explain it below.
> 
> >> This needs more explanation, like what exactly is wrong with this
> >> device?  A missing PCIe capability might cause other problems.
> >>
> >> What problem does this fix?  Without these patches, do we merely add
> >> aliases that are unnecessary?  Do we crash because something goes
> >> wrong in the pci_pcie_type() switch because of the incorrect
> >> capability?  
> 
> Here's how (for example) how the PCI enumeration of a 2 node Vulcan
> processor will look like:
> 
> 
> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
>     |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
>     |              .
>     |              ... etc...
>     |
>     +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
>                    +-10.a.1----[13]---13.0.0---[14]---14.0.0
>                    .
>                    ... etc...

So we have:

    "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints
      (no pcie)      (pci/x-pcie)

> 
> The devices 0.0.x and x.a.x are glue bridges that are there to
> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
> have a PCIe capability (type 8) and 0.0.x does not have any pcie
> capability.
> 
> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
> driver code does dma alias walk to find the device id to use
> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
> since they are type root port, but will continue on up and end
> up with incorrect device id.
> 
> The flag I have added is to make the pci_for_each_dma_alias()
> ignore the last 2 levels of glue/internal bridges.

Per the PCIe spec, I'm not even sure what you have is a valid
hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe
bridge.  So really you're pretending that the downstream "glue bridge"
is your host bridge and therefore root complex, but the PCI topology
would clearly dictate that the top-most bus is conventional PCI and
therefore everything is an alias of everything else and the DMA alias
code is doing exactly what it should.

Also note that the caller of pci_for_each_dma_alias() owns the callback
function triggered for each alias, that function could choose to prune
out these "glue bridges" itself if that's the appropriate thing to do.
Where do the SMMU and ITS actually reside in the above diagram?  You
can use the callout function to stop the traversal anywhere you wish,
it's free to return a -errno to stop or positive number, which the
caller can interpret as error or non-failure stop condition.

You could even think about changing what Linux considers to be the host
bridge.  Maybe these glue bridges shouldn't even be visible to Linux,
would that also solve the issue with the broken BAR?

> > The change takes the same code path as it would for a real PCIe bridge
> > port (downstream/upstream/root), which means they want to skip adding
> > this bridge as an alias of the device.  So we're adding in aliases that
> > don't exist, the bridge itself.
> >
> > If anything I'd suggest a flag that actually tries to address the
> > problem rather than a symptom of the problem.  For example, maybe the
> > flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
> > take that into account.  That has some trickle through for
> > pci_pcie_type() and all the accessor functions, but maybe it's a
> > cleaner solution overall (or maybe it explodes further).  Thanks,  
> 
> I didn't really want to mark the glue bridges as PCIe or have fake
> PCIe capability there, the obvious simple solution was to add
> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS
> 
> Any suggestions/comments on how to do this better would be welcome.

I definitely don't think either flag idea is the right solution, I
think you've actually already got the tools you need to solve it by
putting the intelligence in the callback function or by going further
down the path of pretending the glue bridge is really a host bridge.
Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jayachandran Chandrashekaran Nair Feb. 17, 2016, 11:45 a.m. UTC | #5
On Wed, Feb 17, 2016 at 3:55 AM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Wed, 17 Feb 2016 02:38:24 +0530
> Jayachandran Chandrashekaran Nair
> <jayachandran.chandrashekaran@broadcom.com> wrote:
>
>> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Mon, 15 Feb 2016 12:20:23 -0600
>> > Bjorn Helgaas <helgaas@kernel.org> wrote:
>> >
>> >> [+cc Alex, iommu list]
>> >>
>> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:
>> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
>> >> > that should not be considered during DMA alias search. This is
>> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
>> >> > that has internal bridges which have either missing or wrong PCIe
>> >> > capabilities.
>> >
>> > I figured this would come at some point, the right answer is of course
>> > to follow the PCIe spec and implement the required PCIe capability in
>> > the hardware.
>>
>> There are  PCIe controllers on the chip that follows the spec, the issue is
>> how it is integrated to the northbridge (equivalent) on the chip, I have
>> tried to explain it below.
>>
>> >> This needs more explanation, like what exactly is wrong with this
>> >> device?  A missing PCIe capability might cause other problems.
>> >>
>> >> What problem does this fix?  Without these patches, do we merely add
>> >> aliases that are unnecessary?  Do we crash because something goes
>> >> wrong in the pci_pcie_type() switch because of the incorrect
>> >> capability?
>>
>> Here's how (for example) how the PCI enumeration of a 2 node Vulcan
>> processor will look like:
>>
>>
>> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
>>     |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
>>     |              .
>>     |              ... etc...
>>     |
>>     +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
>>                    +-10.a.1----[13]---13.0.0---[14]---14.0.0
>>                    .
>>                    ... etc...
>
> So we have:
>
>     "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints
>       (no pcie)      (pci/x-pcie)

The top level is one glue bridge per chip in a multi-chip board,
but otherwise this is accurate.

>>
>> The devices 0.0.x and x.a.x are glue bridges that are there to
>> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
>> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
>> have a PCIe capability (type 8) and 0.0.x does not have any pcie
>> capability.
>>
>> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
>> driver code does dma alias walk to find the device id to use
>> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
>> since they are type root port, but will continue on up and end
>> up with incorrect device id.
>>
>> The flag I have added is to make the pci_for_each_dma_alias()
>> ignore the last 2 levels of glue/internal bridges.
>
> Per the PCIe spec, I'm not even sure what you have is a valid
> hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe
> bridge.  So really you're pretending that the downstream "glue bridge"
> is your host bridge and therefore root complex, but the PCI topology
> would clearly dictate that the top-most bus is conventional PCI and
> therefore everything is an alias of everything else and the DMA alias
> code is doing exactly what it should.

Yes, I am not arguing that there is any issue in the current code. I
am trying to figure out the correct way to implement the quirk. We
have to support the hardware we have, not the hardware we want to
have :)

> Also note that the caller of pci_for_each_dma_alias() owns the callback
> function triggered for each alias, that function could choose to prune
> out these "glue bridges" itself if that's the appropriate thing to do.

I had implemented this first, but moved to the current approach because
it is needed in multiple places. The problem is: "On vulcan, while
going up the pcie bus heirarchy for finding aliases, skip the glue bridges",
So the logical place to put the fix is in pci_for_each_dma_alias()

> Where do the SMMU and ITS actually reside in the above diagram?  You
> can use the callout function to stop the traversal anywhere you wish,
> it's free to return a -errno to stop or positive number, which the
> caller can interpret as error or non-failure stop condition.

The SMMU (for translation requests) and ITS (for MSI writes for
interrupts) are connected directly to the proper PCIe controller
(2/4/11/13.0.0 above)

> You could even think about changing what Linux considers to be the host
> bridge.  Maybe these glue bridges shouldn't even be visible to Linux,
> would that also solve the issue with the broken BAR?

The glue bridges have to seen by Linux for assigning resources like
bus ranges and memory windows. So these are required bridges in
the topology and will work with the standard linux code
(provided we skip them for aliases, and and ignore the BAR).

>> > The change takes the same code path as it would for a real PCIe bridge
>> > port (downstream/upstream/root), which means they want to skip adding
>> > this bridge as an alias of the device.  So we're adding in aliases that
>> > don't exist, the bridge itself.
>> >
>> > If anything I'd suggest a flag that actually tries to address the
>> > problem rather than a symptom of the problem.  For example, maybe the
>> > flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
>> > take that into account.  That has some trickle through for
>> > pci_pcie_type() and all the accessor functions, but maybe it's a
>> > cleaner solution overall (or maybe it explodes further).  Thanks,
>>
>> I didn't really want to mark the glue bridges as PCIe or have fake
>> PCIe capability there, the obvious simple solution was to add
>> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS
>>
>> Any suggestions/comments on how to do this better would be welcome.
>
> I definitely don't think either flag idea is the right solution, I
> think you've actually already got the tools you need to solve it by
> putting the intelligence in the callback function or by going further
> down the path of pretending the glue bridge is really a host bridge.

If I understand your suggestion, I need to fake the PCIe capability
for the glue bridges, which may be reasonable for the second level.
But for the first level, there is no pcie capability and faking that
becomes more difficult.

If the concern is adding a flag for just one platform, I can check
for the vendor ID/device ID in the pci_for_each_dma_alias()
function, which would be slightly uglier, but still a simple and
maintainable solution.

Another option is to break the pci_for_each_dma_alias when it
reaches to root port (for vulcan), this too can be done without
a flag.

Thanks,
JC.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 17, 2016, 3:28 p.m. UTC | #6
On Wed, 17 Feb 2016 17:15:09 +0530
Jayachandran Chandrashekaran Nair
<jayachandran.chandrashekaran@broadcom.com> wrote:

> On Wed, Feb 17, 2016 at 3:55 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Wed, 17 Feb 2016 02:38:24 +0530
> > Jayachandran Chandrashekaran Nair
> > <jayachandran.chandrashekaran@broadcom.com> wrote:
> >  
> >> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:  
> >> > On Mon, 15 Feb 2016 12:20:23 -0600
> >> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >> >  
> >> >> [+cc Alex, iommu list]
> >> >>
> >> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:  
> >> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
> >> >> > that should not be considered during DMA alias search. This is
> >> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
> >> >> > that has internal bridges which have either missing or wrong PCIe
> >> >> > capabilities.  
> >> >
> >> > I figured this would come at some point, the right answer is of course
> >> > to follow the PCIe spec and implement the required PCIe capability in
> >> > the hardware.  
> >>
> >> There are  PCIe controllers on the chip that follows the spec, the issue is
> >> how it is integrated to the northbridge (equivalent) on the chip, I have
> >> tried to explain it below.
> >>  
> >> >> This needs more explanation, like what exactly is wrong with this
> >> >> device?  A missing PCIe capability might cause other problems.
> >> >>
> >> >> What problem does this fix?  Without these patches, do we merely add
> >> >> aliases that are unnecessary?  Do we crash because something goes
> >> >> wrong in the pci_pcie_type() switch because of the incorrect
> >> >> capability?  
> >>
> >> Here's how (for example) how the PCI enumeration of a 2 node Vulcan
> >> processor will look like:
> >>
> >>
> >> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
> >>     |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
> >>     |              .
> >>     |              ... etc...
> >>     |
> >>     +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
> >>                    +-10.a.1----[13]---13.0.0---[14]---14.0.0
> >>                    .
> >>                    ... etc...  
> >
> > So we have:
> >
> >     "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints
> >       (no pcie)      (pci/x-pcie)  
> 
> The top level is one glue bridge per chip in a multi-chip board,
> but otherwise this is accurate.
> 
> >>
> >> The devices 0.0.x and x.a.x are glue bridges that are there to
> >> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
> >> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
> >> have a PCIe capability (type 8) and 0.0.x does not have any pcie
> >> capability.
> >>
> >> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
> >> driver code does dma alias walk to find the device id to use
> >> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
> >> since they are type root port, but will continue on up and end
> >> up with incorrect device id.
> >>
> >> The flag I have added is to make the pci_for_each_dma_alias()
> >> ignore the last 2 levels of glue/internal bridges.  
> >
> > Per the PCIe spec, I'm not even sure what you have is a valid
> > hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe
> > bridge.  So really you're pretending that the downstream "glue bridge"
> > is your host bridge and therefore root complex, but the PCI topology
> > would clearly dictate that the top-most bus is conventional PCI and
> > therefore everything is an alias of everything else and the DMA alias
> > code is doing exactly what it should.  
> 
> Yes, I am not arguing that there is any issue in the current code. I
> am trying to figure out the correct way to implement the quirk. We
> have to support the hardware we have, not the hardware we want to
> have :)
> 
> > Also note that the caller of pci_for_each_dma_alias() owns the callback
> > function triggered for each alias, that function could choose to prune
> > out these "glue bridges" itself if that's the appropriate thing to do.  
> 
> I had implemented this first, but moved to the current approach because
> it is needed in multiple places. The problem is: "On vulcan, while
> going up the pcie bus heirarchy for finding aliases, skip the glue bridges",
> So the logical place to put the fix is in pci_for_each_dma_alias()
> 
> > Where do the SMMU and ITS actually reside in the above diagram?  You
> > can use the callout function to stop the traversal anywhere you wish,
> > it's free to return a -errno to stop or positive number, which the
> > caller can interpret as error or non-failure stop condition.  
> 
> The SMMU (for translation requests) and ITS (for MSI writes for
> interrupts) are connected directly to the proper PCIe controller
> (2/4/11/13.0.0 above)


If the translation unit is on the root port then DMA aliases upstream of
the translation unit are irrelevant and the callback function should
stop the traversal at that point.

> > You could even think about changing what Linux considers to be the host
> > bridge.  Maybe these glue bridges shouldn't even be visible to Linux,
> > would that also solve the issue with the broken BAR?  
> 
> The glue bridges have to seen by Linux for assigning resources like
> bus ranges and memory windows. So these are required bridges in
> the topology and will work with the standard linux code
> (provided we skip them for aliases, and and ignore the BAR).
> 
> >> > The change takes the same code path as it would for a real PCIe bridge
> >> > port (downstream/upstream/root), which means they want to skip adding
> >> > this bridge as an alias of the device.  So we're adding in aliases that
> >> > don't exist, the bridge itself.
> >> >
> >> > If anything I'd suggest a flag that actually tries to address the
> >> > problem rather than a symptom of the problem.  For example, maybe the
> >> > flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
> >> > take that into account.  That has some trickle through for
> >> > pci_pcie_type() and all the accessor functions, but maybe it's a
> >> > cleaner solution overall (or maybe it explodes further).  Thanks,  
> >>
> >> I didn't really want to mark the glue bridges as PCIe or have fake
> >> PCIe capability there, the obvious simple solution was to add
> >> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS
> >>
> >> Any suggestions/comments on how to do this better would be welcome.  
> >
> > I definitely don't think either flag idea is the right solution, I
> > think you've actually already got the tools you need to solve it by
> > putting the intelligence in the callback function or by going further
> > down the path of pretending the glue bridge is really a host bridge.  
> 
> If I understand your suggestion, I need to fake the PCIe capability
> for the glue bridges, which may be reasonable for the second level.
> But for the first level, there is no pcie capability and faking that
> becomes more difficult.

That's not what I'm suggesting.
 
> If the concern is adding a flag for just one platform, I can check
> for the vendor ID/device ID in the pci_for_each_dma_alias()
> function, which would be slightly uglier, but still a simple and
> maintainable solution.
> 
> Another option is to break the pci_for_each_dma_alias when it
> reaches to root port (for vulcan), this too can be done without
> a flag.

Exactly, this can be done by the callback function provided on vulcan
systems and requires no core changes.  The IOMMU needs to provide a
callback function that makes sense for collecting aliases on the
specific platform.  If the translation unit is not on the root bus, the
callback function should stop the topology walk earlier.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jayachandran Chandrashekaran Nair Feb. 18, 2016, 1:27 p.m. UTC | #7
On Wed, Feb 17, 2016 at 8:58 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Wed, 17 Feb 2016 17:15:09 +0530
> Jayachandran Chandrashekaran Nair
> <jayachandran.chandrashekaran@broadcom.com> wrote:
>
>> On Wed, Feb 17, 2016 at 3:55 AM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Wed, 17 Feb 2016 02:38:24 +0530
>> > Jayachandran Chandrashekaran Nair
>> > <jayachandran.chandrashekaran@broadcom.com> wrote:
>> >
>> >> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
>> >> <alex.williamson@redhat.com> wrote:
>> >> > On Mon, 15 Feb 2016 12:20:23 -0600
>> >> > Bjorn Helgaas <helgaas@kernel.org> wrote:
>> >> >
>> >> >> [+cc Alex, iommu list]
>> >> >>
>> >> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:
>> >> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
>> >> >> > that should not be considered during DMA alias search. This is
>> >> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
>> >> >> > that has internal bridges which have either missing or wrong PCIe
>> >> >> > capabilities.
>> >> >
>> >> > I figured this would come at some point, the right answer is of course
>> >> > to follow the PCIe spec and implement the required PCIe capability in
>> >> > the hardware.
>> >>
>> >> There are  PCIe controllers on the chip that follows the spec, the issue is
>> >> how it is integrated to the northbridge (equivalent) on the chip, I have
>> >> tried to explain it below.
>> >>
>> >> >> This needs more explanation, like what exactly is wrong with this
>> >> >> device?  A missing PCIe capability might cause other problems.
>> >> >>
>> >> >> What problem does this fix?  Without these patches, do we merely add
>> >> >> aliases that are unnecessary?  Do we crash because something goes
>> >> >> wrong in the pci_pcie_type() switch because of the incorrect
>> >> >> capability?
>> >>
>> >> Here's how (for example) how the PCI enumeration of a 2 node Vulcan
>> >> processor will look like:
>> >>
>> >>
>> >> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
>> >>     |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
>> >>     |              .
>> >>     |              ... etc...
>> >>     |
>> >>     +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
>> >>                    +-10.a.1----[13]---13.0.0---[14]---14.0.0
>> >>                    .
>> >>                    ... etc...
>> >
>> > So we have:
>> >
>> >     "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints
>> >       (no pcie)      (pci/x-pcie)
>>
>> The top level is one glue bridge per chip in a multi-chip board,
>> but otherwise this is accurate.
>>
>> >>
>> >> The devices 0.0.x and x.a.x are glue bridges that are there to
>> >> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
>> >> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
>> >> have a PCIe capability (type 8) and 0.0.x does not have any pcie
>> >> capability.
>> >>
>> >> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
>> >> driver code does dma alias walk to find the device id to use
>> >> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
>> >> since they are type root port, but will continue on up and end
>> >> up with incorrect device id.
>> >>
>> >> The flag I have added is to make the pci_for_each_dma_alias()
>> >> ignore the last 2 levels of glue/internal bridges.
>> >
>> > Per the PCIe spec, I'm not even sure what you have is a valid
>> > hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe
>> > bridge.  So really you're pretending that the downstream "glue bridge"
>> > is your host bridge and therefore root complex, but the PCI topology
>> > would clearly dictate that the top-most bus is conventional PCI and
>> > therefore everything is an alias of everything else and the DMA alias
>> > code is doing exactly what it should.
>>
>> Yes, I am not arguing that there is any issue in the current code. I
>> am trying to figure out the correct way to implement the quirk. We
>> have to support the hardware we have, not the hardware we want to
>> have :)
>>
>> > Also note that the caller of pci_for_each_dma_alias() owns the callback
>> > function triggered for each alias, that function could choose to prune
>> > out these "glue bridges" itself if that's the appropriate thing to do.
>>
>> I had implemented this first, but moved to the current approach because
>> it is needed in multiple places. The problem is: "On vulcan, while
>> going up the pcie bus heirarchy for finding aliases, skip the glue bridges",
>> So the logical place to put the fix is in pci_for_each_dma_alias()
>>
>> > Where do the SMMU and ITS actually reside in the above diagram?  You
>> > can use the callout function to stop the traversal anywhere you wish,
>> > it's free to return a -errno to stop or positive number, which the
>> > caller can interpret as error or non-failure stop condition.
>>
>> The SMMU (for translation requests) and ITS (for MSI writes for
>> interrupts) are connected directly to the proper PCIe controller
>> (2/4/11/13.0.0 above)
>
>
> If the translation unit is on the root port then DMA aliases upstream of
> the translation unit are irrelevant and the callback function should
> stop the traversal at that point.
>
>> > You could even think about changing what Linux considers to be the host
>> > bridge.  Maybe these glue bridges shouldn't even be visible to Linux,
>> > would that also solve the issue with the broken BAR?
>>
>> The glue bridges have to seen by Linux for assigning resources like
>> bus ranges and memory windows. So these are required bridges in
>> the topology and will work with the standard linux code
>> (provided we skip them for aliases, and and ignore the BAR).
>>
>> >> > The change takes the same code path as it would for a real PCIe bridge
>> >> > port (downstream/upstream/root), which means they want to skip adding
>> >> > this bridge as an alias of the device.  So we're adding in aliases that
>> >> > don't exist, the bridge itself.
>> >> >
>> >> > If anything I'd suggest a flag that actually tries to address the
>> >> > problem rather than a symptom of the problem.  For example, maybe the
>> >> > flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
>> >> > take that into account.  That has some trickle through for
>> >> > pci_pcie_type() and all the accessor functions, but maybe it's a
>> >> > cleaner solution overall (or maybe it explodes further).  Thanks,
>> >>
>> >> I didn't really want to mark the glue bridges as PCIe or have fake
>> >> PCIe capability there, the obvious simple solution was to add
>> >> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS
>> >>
>> >> Any suggestions/comments on how to do this better would be welcome.
>> >
>> > I definitely don't think either flag idea is the right solution, I
>> > think you've actually already got the tools you need to solve it by
>> > putting the intelligence in the callback function or by going further
>> > down the path of pretending the glue bridge is really a host bridge.
>>
>> If I understand your suggestion, I need to fake the PCIe capability
>> for the glue bridges, which may be reasonable for the second level.
>> But for the first level, there is no pcie capability and faking that
>> becomes more difficult.
>
> That's not what I'm suggesting.
>
>> If the concern is adding a flag for just one platform, I can check
>> for the vendor ID/device ID in the pci_for_each_dma_alias()
>> function, which would be slightly uglier, but still a simple and
>> maintainable solution.
>>
>> Another option is to break the pci_for_each_dma_alias when it
>> reaches to root port (for vulcan), this too can be done without
>> a flag.
>
> Exactly, this can be done by the callback function provided on vulcan
> systems and requires no core changes.  The IOMMU needs to provide a
> callback function that makes sense for collecting aliases on the
> specific platform.  If the translation unit is not on the root bus, the
> callback function should stop the topology walk earlier.  Thanks,

Fixing every callback function is going to be painful and ugly.
I will have to change drivers/irqchip/irq-gic-v3-its-pci-msi.c,
drivers/pci/msi.c, and drivers/iommu/arm-smmu-v3.c with the
same check for vendor/device ids and return early.

I can update the patch to check the vendor/device id in the
case of ROOT_PORT in pci_for_each_dma_alias(). If you
think that will not be acceptable, I don't have any reasonable
options left.

Thanks,
JC.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson Feb. 18, 2016, 2:14 p.m. UTC | #8
On Thu, 18 Feb 2016 18:57:12 +0530
Jayachandran Chandrashekaran Nair
<jayachandran.chandrashekaran@broadcom.com> wrote:

> On Wed, Feb 17, 2016 at 8:58 PM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> > On Wed, 17 Feb 2016 17:15:09 +0530
> > Jayachandran Chandrashekaran Nair
> > <jayachandran.chandrashekaran@broadcom.com> wrote:
> >  
> >> On Wed, Feb 17, 2016 at 3:55 AM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:  
> >> > On Wed, 17 Feb 2016 02:38:24 +0530
> >> > Jayachandran Chandrashekaran Nair
> >> > <jayachandran.chandrashekaran@broadcom.com> wrote:
> >> >  
> >> >> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
> >> >> <alex.williamson@redhat.com> wrote:  
> >> >> > On Mon, 15 Feb 2016 12:20:23 -0600
> >> >> > Bjorn Helgaas <helgaas@kernel.org> wrote:
> >> >> >  
> >> >> >> [+cc Alex, iommu list]
> >> >> >>
> >> >> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:  
> >> >> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
> >> >> >> > that should not be considered during DMA alias search. This is
> >> >> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
> >> >> >> > that has internal bridges which have either missing or wrong PCIe
> >> >> >> > capabilities.  
> >> >> >
> >> >> > I figured this would come at some point, the right answer is of course
> >> >> > to follow the PCIe spec and implement the required PCIe capability in
> >> >> > the hardware.  
> >> >>
> >> >> There are  PCIe controllers on the chip that follows the spec, the issue is
> >> >> how it is integrated to the northbridge (equivalent) on the chip, I have
> >> >> tried to explain it below.
> >> >>  
> >> >> >> This needs more explanation, like what exactly is wrong with this
> >> >> >> device?  A missing PCIe capability might cause other problems.
> >> >> >>
> >> >> >> What problem does this fix?  Without these patches, do we merely add
> >> >> >> aliases that are unnecessary?  Do we crash because something goes
> >> >> >> wrong in the pci_pcie_type() switch because of the incorrect
> >> >> >> capability?  
> >> >>
> >> >> Here's how (for example) how the PCI enumeration of a 2 node Vulcan
> >> >> processor will look like:
> >> >>
> >> >>
> >> >> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
> >> >>     |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
> >> >>     |              .
> >> >>     |              ... etc...
> >> >>     |
> >> >>     +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
> >> >>                    +-10.a.1----[13]---13.0.0---[14]---14.0.0
> >> >>                    .
> >> >>                    ... etc...  
> >> >
> >> > So we have:
> >> >
> >> >     "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints
> >> >       (no pcie)      (pci/x-pcie)  
> >>
> >> The top level is one glue bridge per chip in a multi-chip board,
> >> but otherwise this is accurate.
> >>  
> >> >>
> >> >> The devices 0.0.x and x.a.x are glue bridges that are there to
> >> >> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
> >> >> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
> >> >> have a PCIe capability (type 8) and 0.0.x does not have any pcie
> >> >> capability.
> >> >>
> >> >> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
> >> >> driver code does dma alias walk to find the device id to use
> >> >> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
> >> >> since they are type root port, but will continue on up and end
> >> >> up with incorrect device id.
> >> >>
> >> >> The flag I have added is to make the pci_for_each_dma_alias()
> >> >> ignore the last 2 levels of glue/internal bridges.  
> >> >
> >> > Per the PCIe spec, I'm not even sure what you have is a valid
> >> > hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe
> >> > bridge.  So really you're pretending that the downstream "glue bridge"
> >> > is your host bridge and therefore root complex, but the PCI topology
> >> > would clearly dictate that the top-most bus is conventional PCI and
> >> > therefore everything is an alias of everything else and the DMA alias
> >> > code is doing exactly what it should.  
> >>
> >> Yes, I am not arguing that there is any issue in the current code. I
> >> am trying to figure out the correct way to implement the quirk. We
> >> have to support the hardware we have, not the hardware we want to
> >> have :)
> >>  
> >> > Also note that the caller of pci_for_each_dma_alias() owns the callback
> >> > function triggered for each alias, that function could choose to prune
> >> > out these "glue bridges" itself if that's the appropriate thing to do.  
> >>
> >> I had implemented this first, but moved to the current approach because
> >> it is needed in multiple places. The problem is: "On vulcan, while
> >> going up the pcie bus heirarchy for finding aliases, skip the glue bridges",
> >> So the logical place to put the fix is in pci_for_each_dma_alias()
> >>  
> >> > Where do the SMMU and ITS actually reside in the above diagram?  You
> >> > can use the callout function to stop the traversal anywhere you wish,
> >> > it's free to return a -errno to stop or positive number, which the
> >> > caller can interpret as error or non-failure stop condition.  
> >>
> >> The SMMU (for translation requests) and ITS (for MSI writes for
> >> interrupts) are connected directly to the proper PCIe controller
> >> (2/4/11/13.0.0 above)  
> >
> >
> > If the translation unit is on the root port then DMA aliases upstream of
> > the translation unit are irrelevant and the callback function should
> > stop the traversal at that point.
> >  
> >> > You could even think about changing what Linux considers to be the host
> >> > bridge.  Maybe these glue bridges shouldn't even be visible to Linux,
> >> > would that also solve the issue with the broken BAR?  
> >>
> >> The glue bridges have to seen by Linux for assigning resources like
> >> bus ranges and memory windows. So these are required bridges in
> >> the topology and will work with the standard linux code
> >> (provided we skip them for aliases, and and ignore the BAR).
> >>  
> >> >> > The change takes the same code path as it would for a real PCIe bridge
> >> >> > port (downstream/upstream/root), which means they want to skip adding
> >> >> > this bridge as an alias of the device.  So we're adding in aliases that
> >> >> > don't exist, the bridge itself.
> >> >> >
> >> >> > If anything I'd suggest a flag that actually tries to address the
> >> >> > problem rather than a symptom of the problem.  For example, maybe the
> >> >> > flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
> >> >> > take that into account.  That has some trickle through for
> >> >> > pci_pcie_type() and all the accessor functions, but maybe it's a
> >> >> > cleaner solution overall (or maybe it explodes further).  Thanks,  
> >> >>
> >> >> I didn't really want to mark the glue bridges as PCIe or have fake
> >> >> PCIe capability there, the obvious simple solution was to add
> >> >> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS
> >> >>
> >> >> Any suggestions/comments on how to do this better would be welcome.  
> >> >
> >> > I definitely don't think either flag idea is the right solution, I
> >> > think you've actually already got the tools you need to solve it by
> >> > putting the intelligence in the callback function or by going further
> >> > down the path of pretending the glue bridge is really a host bridge.  
> >>
> >> If I understand your suggestion, I need to fake the PCIe capability
> >> for the glue bridges, which may be reasonable for the second level.
> >> But for the first level, there is no pcie capability and faking that
> >> becomes more difficult.  
> >
> > That's not what I'm suggesting.
> >  
> >> If the concern is adding a flag for just one platform, I can check
> >> for the vendor ID/device ID in the pci_for_each_dma_alias()
> >> function, which would be slightly uglier, but still a simple and
> >> maintainable solution.
> >>
> >> Another option is to break the pci_for_each_dma_alias when it
> >> reaches to root port (for vulcan), this too can be done without
> >> a flag.  
> >
> > Exactly, this can be done by the callback function provided on vulcan
> > systems and requires no core changes.  The IOMMU needs to provide a
> > callback function that makes sense for collecting aliases on the
> > specific platform.  If the translation unit is not on the root bus, the
> > callback function should stop the topology walk earlier.  Thanks,  
> 
> Fixing every callback function is going to be painful and ugly.
> I will have to change drivers/irqchip/irq-gic-v3-its-pci-msi.c,
> drivers/pci/msi.c, and drivers/iommu/arm-smmu-v3.c with the
> same check for vendor/device ids and return early.
> 
> I can update the patch to check the vendor/device id in the
> case of ROOT_PORT in pci_for_each_dma_alias(). If you
> think that will not be acceptable, I don't have any reasonable
> options left.

I've never known there to necessarily be a correlation between the
easiest solution and the correct one, but I think there's probably
still an easy and clean solution not far from your original approach.
pci_for_each_dma_alias() is meant to collect all the aliases for a
given device.  It stops at the root bus both because it can't go
further and because on many platforms, that's where the translation
unit lives.  If we want to have a common and easy way to stop it above
the root bus then we could flag a device as being the DMA alias root
and exit at that point.  I think that makes more sense architecturally
than some mystery devices flagged to skip reporting an alias.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jayachandran Chandrashekaran Nair Feb. 20, 2016, 6:15 p.m. UTC | #9
Hi Alex,

On Thu, Feb 18, 2016 at 7:44 PM, Alex Williamson
<alex.williamson@redhat.com> wrote:
> On Thu, 18 Feb 2016 18:57:12 +0530
> Jayachandran Chandrashekaran Nair
> <jayachandran.chandrashekaran@broadcom.com> wrote:
>
>> On Wed, Feb 17, 2016 at 8:58 PM, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>> > On Wed, 17 Feb 2016 17:15:09 +0530
>> > Jayachandran Chandrashekaran Nair
>> > <jayachandran.chandrashekaran@broadcom.com> wrote:
>> >
>> >> On Wed, Feb 17, 2016 at 3:55 AM, Alex Williamson
>> >> <alex.williamson@redhat.com> wrote:
>> >> > On Wed, 17 Feb 2016 02:38:24 +0530
>> >> > Jayachandran Chandrashekaran Nair
>> >> > <jayachandran.chandrashekaran@broadcom.com> wrote:
>> >> >
>> >> >> On Tue, Feb 16, 2016 at 1:09 AM, Alex Williamson
>> >> >> <alex.williamson@redhat.com> wrote:
>> >> >> > On Mon, 15 Feb 2016 12:20:23 -0600
>> >> >> > Bjorn Helgaas <helgaas@kernel.org> wrote:
>> >> >> >
>> >> >> >> [+cc Alex, iommu list]
>> >> >> >>
>> >> >> >> On Mon, Feb 15, 2016 at 03:35:00AM +0530, Jayachandran C wrote:
>> >> >> >> > Add a new flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS to indicate bridges
>> >> >> >> > that should not be considered during DMA alias search. This is
>> >> >> >> > to support hardware (in this case Broadcom Vulcan PCIe subsystem)
>> >> >> >> > that has internal bridges which have either missing or wrong PCIe
>> >> >> >> > capabilities.
>> >> >> >
>> >> >> > I figured this would come at some point, the right answer is of course
>> >> >> > to follow the PCIe spec and implement the required PCIe capability in
>> >> >> > the hardware.
>> >> >>
>> >> >> There are  PCIe controllers on the chip that follows the spec, the issue is
>> >> >> how it is integrated to the northbridge (equivalent) on the chip, I have
>> >> >> tried to explain it below.
>> >> >>
>> >> >> >> This needs more explanation, like what exactly is wrong with this
>> >> >> >> device?  A missing PCIe capability might cause other problems.
>> >> >> >>
>> >> >> >> What problem does this fix?  Without these patches, do we merely add
>> >> >> >> aliases that are unnecessary?  Do we crash because something goes
>> >> >> >> wrong in the pci_pcie_type() switch because of the incorrect
>> >> >> >> capability?
>> >> >>
>> >> >> Here's how (for example) how the PCI enumeration of a 2 node Vulcan
>> >> >> processor will look like:
>> >> >>
>> >> >>
>> >> >> [0] +-0.0.0--[1]---+--1.a.0----[2]-----2.0.0---[3]----3.0.0
>> >> >>     |              +--1.a.1----[4]-----4.0.0---[5]----5.0.0
>> >> >>     |              .
>> >> >>     |              ... etc...
>> >> >>     |
>> >> >>     +-0.0.1--[10]--+-10.a.0----[11]---11.0.0---[12]---12.0.0
>> >> >>                    +-10.a.1----[13]---13.0.0---[14]---14.0.0
>> >> >>                    .
>> >> >>                    ... etc...
>> >> >
>> >> > So we have:
>> >> >
>> >> >     "Glue Bridge"---"Glue Bridges"---Root Ports---Endpoints
>> >> >       (no pcie)      (pci/x-pcie)
>> >>
>> >> The top level is one glue bridge per chip in a multi-chip board,
>> >> but otherwise this is accurate.
>> >>
>> >> >>
>> >> >> The devices 0.0.x and x.a.x are glue bridges that are there to
>> >> >> bring the real PCIe controllers (pcie cap type 4) 2.0.0, 4.0.0,
>> >> >> 11.0.0, 13.0.0 under a single PCI hierarchy. The x.a.x bridges
>> >> >> have a PCIe capability (type 8) and 0.0.x does not have any pcie
>> >> >> capability.
>> >> >>
>> >> >> In case of Vulcan, both the GICv3 ITS driver code and the SMMUv3
>> >> >> driver code does dma alias walk to find the device id to use
>> >> >> in ITS and SMMU. In both cases it will ignore the x.0.0 bridges
>> >> >> since they are type root port, but will continue on up and end
>> >> >> up with incorrect device id.
>> >> >>
>> >> >> The flag I have added is to make the pci_for_each_dma_alias()
>> >> >> ignore the last 2 levels of glue/internal bridges.
>> >> >
>> >> > Per the PCIe spec, I'm not even sure what you have is a valid
>> >> > hierarchy, root ports sit on root complexes, not behind a PCI-to-PCIe
>> >> > bridge.  So really you're pretending that the downstream "glue bridge"
>> >> > is your host bridge and therefore root complex, but the PCI topology
>> >> > would clearly dictate that the top-most bus is conventional PCI and
>> >> > therefore everything is an alias of everything else and the DMA alias
>> >> > code is doing exactly what it should.
>> >>
>> >> Yes, I am not arguing that there is any issue in the current code. I
>> >> am trying to figure out the correct way to implement the quirk. We
>> >> have to support the hardware we have, not the hardware we want to
>> >> have :)
>> >>
>> >> > Also note that the caller of pci_for_each_dma_alias() owns the callback
>> >> > function triggered for each alias, that function could choose to prune
>> >> > out these "glue bridges" itself if that's the appropriate thing to do.
>> >>
>> >> I had implemented this first, but moved to the current approach because
>> >> it is needed in multiple places. The problem is: "On vulcan, while
>> >> going up the pcie bus heirarchy for finding aliases, skip the glue bridges",
>> >> So the logical place to put the fix is in pci_for_each_dma_alias()
>> >>
>> >> > Where do the SMMU and ITS actually reside in the above diagram?  You
>> >> > can use the callout function to stop the traversal anywhere you wish,
>> >> > it's free to return a -errno to stop or positive number, which the
>> >> > caller can interpret as error or non-failure stop condition.
>> >>
>> >> The SMMU (for translation requests) and ITS (for MSI writes for
>> >> interrupts) are connected directly to the proper PCIe controller
>> >> (2/4/11/13.0.0 above)
>> >
>> >
>> > If the translation unit is on the root port then DMA aliases upstream of
>> > the translation unit are irrelevant and the callback function should
>> > stop the traversal at that point.
>> >
>> >> > You could even think about changing what Linux considers to be the host
>> >> > bridge.  Maybe these glue bridges shouldn't even be visible to Linux,
>> >> > would that also solve the issue with the broken BAR?
>> >>
>> >> The glue bridges have to seen by Linux for assigning resources like
>> >> bus ranges and memory windows. So these are required bridges in
>> >> the topology and will work with the standard linux code
>> >> (provided we skip them for aliases, and and ignore the BAR).
>> >>
>> >> >> > The change takes the same code path as it would for a real PCIe bridge
>> >> >> > port (downstream/upstream/root), which means they want to skip adding
>> >> >> > this bridge as an alias of the device.  So we're adding in aliases that
>> >> >> > don't exist, the bridge itself.
>> >> >> >
>> >> >> > If anything I'd suggest a flag that actually tries to address the
>> >> >> > problem rather than a symptom of the problem.  For example, maybe the
>> >> >> > flag should be PCI_DEV_FLAGS_IS_PCIE.  Maybe pci_is_pcie() should even
>> >> >> > take that into account.  That has some trickle through for
>> >> >> > pci_pcie_type() and all the accessor functions, but maybe it's a
>> >> >> > cleaner solution overall (or maybe it explodes further).  Thanks,
>> >> >>
>> >> >> I didn't really want to mark the glue bridges as PCIe or have fake
>> >> >> PCIe capability there, the obvious simple solution was to add
>> >> >> the flag PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS
>> >> >>
>> >> >> Any suggestions/comments on how to do this better would be welcome.
>> >> >
>> >> > I definitely don't think either flag idea is the right solution, I
>> >> > think you've actually already got the tools you need to solve it by
>> >> > putting the intelligence in the callback function or by going further
>> >> > down the path of pretending the glue bridge is really a host bridge.
>> >>
>> >> If I understand your suggestion, I need to fake the PCIe capability
>> >> for the glue bridges, which may be reasonable for the second level.
>> >> But for the first level, there is no pcie capability and faking that
>> >> becomes more difficult.
>> >
>> > That's not what I'm suggesting.
>> >
>> >> If the concern is adding a flag for just one platform, I can check
>> >> for the vendor ID/device ID in the pci_for_each_dma_alias()
>> >> function, which would be slightly uglier, but still a simple and
>> >> maintainable solution.
>> >>
>> >> Another option is to break the pci_for_each_dma_alias when it
>> >> reaches to root port (for vulcan), this too can be done without
>> >> a flag.
>> >
>> > Exactly, this can be done by the callback function provided on vulcan
>> > systems and requires no core changes.  The IOMMU needs to provide a
>> > callback function that makes sense for collecting aliases on the
>> > specific platform.  If the translation unit is not on the root bus, the
>> > callback function should stop the topology walk earlier.  Thanks,
>>
>> Fixing every callback function is going to be painful and ugly.
>> I will have to change drivers/irqchip/irq-gic-v3-its-pci-msi.c,
>> drivers/pci/msi.c, and drivers/iommu/arm-smmu-v3.c with the
>> same check for vendor/device ids and return early.
>>
>> I can update the patch to check the vendor/device id in the
>> case of ROOT_PORT in pci_for_each_dma_alias(). If you
>> think that will not be acceptable, I don't have any reasonable
>> options left.
>
> I've never known there to necessarily be a correlation between the
> easiest solution and the correct one, but I think there's probably
> still an easy and clean solution not far from your original approach.
> pci_for_each_dma_alias() is meant to collect all the aliases for a
> given device.  It stops at the root bus both because it can't go
> further and because on many platforms, that's where the translation
> unit lives.  If we want to have a common and easy way to stop it above
> the root bus then we could flag a device as being the DMA alias root
> and exit at that point.  I think that makes more sense architecturally
> than some mystery devices flagged to skip reporting an alias.  Thanks,

Thanks for the suggestion, I will try to implement this approach and
submit a newer patchset.

JC.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/search.c b/drivers/pci/search.c
index a20ce7d..e5296aa 100644
--- a/drivers/pci/search.c
+++ b/drivers/pci/search.c
@@ -55,6 +55,8 @@  int pci_for_each_dma_alias(struct pci_dev *pdev,
 			continue;
 
 		tmp = bus->self;
+		if (tmp->dev_flags & PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS)
+			continue;
 
 		/*
 		 * PCIe-to-PCI/X bridges alias transactions from downstream
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 27df4a6..b4d8215 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -182,6 +182,8 @@  enum pci_dev_flags {
 	PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 7),
 	/* Get VPD from function 0 VPD */
 	PCI_DEV_FLAGS_VPD_REF_F0 = (__force pci_dev_flags_t) (1 << 8),
+	/* Bridge should be ignored for alias search  */
+	PCI_DEV_FLAGS_BRIDGE_SKIP_ALIAS = (__force pci_dev_flags_t) (1 << 9),
 };
 
 enum pci_irq_reroute_variant {