diff mbox series

[v3,1/4] PCI: of: Clear 64-bit flag for non-prefetchable memory below 4GB

Message ID 20210607112856.3499682-2-punitagrawal@gmail.com (mailing list archive)
State New, archived
Headers show
Series PCI: of: Improvements to handle 64-bit attribute for non-prefetchable ranges | expand

Commit Message

Punit Agrawal June 7, 2021, 11:28 a.m. UTC
Some host bridges advertise non-prefetchable memory windows that are
entirely located below 4GB but are marked as 64-bit address memory.

Since commit 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
flags for 64-bit memory addresses"), the OF PCI range parser takes a
stricter view and treats 64-bit address ranges as advertised while
before such ranges were treated as 32-bit.

A PCI root port modelled as a PCI-to-PCI bridge cannot forward 64-bit
non-prefetchable memory ranges. As a result, the change in behaviour
due to the commit causes failure to allocate 32-bit BAR from a 64-bit
non-prefetchable window.

In order to not break platforms where non-prefetchable memory ranges
lie entirely below 4GB, clear the 64-bit flag.

Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rob Herring <robh+dt@kernel.org>
---
 drivers/pci/of.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Bjorn Helgaas June 10, 2021, 12:22 a.m. UTC | #1
[+cc Leonardo]

On Mon, Jun 07, 2021 at 08:28:53PM +0900, Punit Agrawal wrote:
> Some host bridges advertise non-prefetchable memory windows that are
> entirely located below 4GB but are marked as 64-bit address memory.
> 
> Since commit 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
> flags for 64-bit memory addresses"), the OF PCI range parser takes a
> stricter view and treats 64-bit address ranges as advertised while
> before such ranges were treated as 32-bit.
> 
> A PCI root port modelled as a PCI-to-PCI bridge cannot forward 64-bit
> non-prefetchable memory ranges. As a result, the change in behaviour
> due to the commit causes failure to allocate 32-bit BAR from a 64-bit
> non-prefetchable window.
> 
> In order to not break platforms where non-prefetchable memory ranges
> lie entirely below 4GB, clear the 64-bit flag.

I don't think we should care about the address width DT supplies for a
host bridge window.  Prior to 9d57e61bf723, I don't think we *did*
care because of_bus_pci_get_flags() threw away that information.

My proposal for a commit log, including information about the problem
report and a "Fixes:" tag:

  Alexandru and Qu reported this resource allocation failure on
  ROCKPro64 v2 and ROCK Pi 4B, both based on the RK3399:

    pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit]
    pci 0000:00:00.0: PCI bridge to [bus 01]
    pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000]
    pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]

  "BAR 14" is the PCI bridge's 32-bit non-prefetchable window, and our
  PCI allocation code isn't smart enough to allocate it in a host
  bridge window marked as 64-bit, even though this should work fine.

  A DT host bridge description includes the windows from the CPU
  address space to the PCI bus space.  On a few architectures
  (microblaze, powerpc, sparc), the DT may also describe PCI devices
  themselves, including their BARs.

  Before 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
  flags for 64-bit memory addresses"), of_bus_pci_get_flags() ignored
  the fact that some DT addresses described 64-bit windows and BARs.
  That was a problem because the virtio virtual NIC has a 32-bit BAR
  and a 64-bit BAR, and the driver couldn't distinguish them.

  9d57e61bf723 set IORESOURCE_MEM_64 for those 64-bit DT ranges, which
  fixed the virtio driver.  But it also set IORESOURCE_MEM_64 for host
  bridge windows, which exposed the fact that the PCI allocator isn't
  smart enough to put 32-bit resources in those 64-bit windows.

  Clear IORESOURCE_MEM_64 from host bridge windows since we don't need
  that information.

  Fixes: 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses")
  Reported-at: https://lore.kernel.org/lkml/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com/
  Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
  Reported-by: Qu Wenruo <wqu@suse.com>

> Suggested-by: Ard Biesheuvel <ardb@kernel.org>
> Link: https://lore.kernel.org/r/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com
> Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: Rob Herring <robh+dt@kernel.org>
> ---
>  drivers/pci/of.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 85dcb7097da4..1e45186a5715 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -353,6 +353,14 @@ static int devm_of_pci_get_host_bridge_resources(struct device *dev,
>  				dev_warn(dev, "More than one I/O resource converted for %pOF. CPU base address for old range lost!\n",
>  					 dev_node);
>  			*io_base = range.cpu_addr;
> +		} else if (resource_type(res) == IORESOURCE_MEM) {
> +			if (!(res->flags & IORESOURCE_PREFETCH)) {
> +				if (res->flags & IORESOURCE_MEM_64)
> +					if (!upper_32_bits(range.pci_addr + range.size - 1)) {
> +						dev_warn(dev, "Clearing 64-bit flag for non-prefetchable memory below 4GB\n");
> +						res->flags &= ~IORESOURCE_MEM_64;
> +					}
> +			}

Why do we need to check IORESOURCE_PREFETCH, IORESOURCE_MEM_64, and
upper_32_bits()?  If I understand this correctly, prior to
9d57e61bf723, IORESOURCE_MEM_64 was *never* set here.  Isn't something
like this sufficient?

  } else if (resource_type(res) == IORESOURCE_MEM) {
    res->flags &= ~IORESOURCE_MEM_64;
  }

I'm not sure we need a warning either.  We didn't warn before
9d57e61bf723, and there's nothing the user needs to do anyway.

>  		}
>  
>  		pci_add_resource_offset(resources, res,	res->start - range.pci_addr);
> -- 
> 2.30.2
>
Punit Agrawal June 10, 2021, 1:34 p.m. UTC | #2
Hi Bjorn,

Bjorn Helgaas <helgaas@kernel.org> writes:

> [+cc Leonardo]
>
> On Mon, Jun 07, 2021 at 08:28:53PM +0900, Punit Agrawal wrote:
>> Some host bridges advertise non-prefetchable memory windows that are
>> entirely located below 4GB but are marked as 64-bit address memory.
>> 
>> Since commit 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
>> flags for 64-bit memory addresses"), the OF PCI range parser takes a
>> stricter view and treats 64-bit address ranges as advertised while
>> before such ranges were treated as 32-bit.
>> 
>> A PCI root port modelled as a PCI-to-PCI bridge cannot forward 64-bit
>> non-prefetchable memory ranges. As a result, the change in behaviour
>> due to the commit causes failure to allocate 32-bit BAR from a 64-bit
>> non-prefetchable window.
>> 
>> In order to not break platforms where non-prefetchable memory ranges
>> lie entirely below 4GB, clear the 64-bit flag.
>
> I don't think we should care about the address width DT supplies for a
> host bridge window.  Prior to 9d57e61bf723, I don't think we *did*
> care because of_bus_pci_get_flags() threw away that information.
>
> My proposal for a commit log, including information about the problem
> report and a "Fixes:" tag:
>
>   Alexandru and Qu reported this resource allocation failure on
>   ROCKPro64 v2 and ROCK Pi 4B, both based on the RK3399:
>
>     pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit]
>     pci 0000:00:00.0: PCI bridge to [bus 01]
>     pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000]
>     pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
>
>   "BAR 14" is the PCI bridge's 32-bit non-prefetchable window, and our
>   PCI allocation code isn't smart enough to allocate it in a host
>   bridge window marked as 64-bit, even though this should work fine.
>
>   A DT host bridge description includes the windows from the CPU
>   address space to the PCI bus space.  On a few architectures
>   (microblaze, powerpc, sparc), the DT may also describe PCI devices
>   themselves, including their BARs.
>
>   Before 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
>   flags for 64-bit memory addresses"), of_bus_pci_get_flags() ignored
>   the fact that some DT addresses described 64-bit windows and BARs.
>   That was a problem because the virtio virtual NIC has a 32-bit BAR
>   and a 64-bit BAR, and the driver couldn't distinguish them.

Many thanks for demystifying the motivation for 9d57e61bf723. Not being
familiar with the usage of DT to describe PCI devices I was missing this
context.

>   9d57e61bf723 set IORESOURCE_MEM_64 for those 64-bit DT ranges, which
>   fixed the virtio driver.  But it also set IORESOURCE_MEM_64 for host
>   bridge windows, which exposed the fact that the PCI allocator isn't
>   smart enough to put 32-bit resources in those 64-bit windows.
>
>   Clear IORESOURCE_MEM_64 from host bridge windows since we don't need
>   that information.
>
>   Fixes: 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses")
>   Reported-at: https://lore.kernel.org/lkml/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com/
>   Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
>   Reported-by: Qu Wenruo <wqu@suse.com>

Thank you for commit log - without all the pieces I was struggling to
clearly describe the details. And I missed the appropriate tags as
well. I've updated the commit log based on your suggestion.

>> Suggested-by: Ard Biesheuvel <ardb@kernel.org>
>> Link: https://lore.kernel.org/r/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com
>> Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
>> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> Cc: Bjorn Helgaas <bhelgaas@google.com>
>> Cc: Rob Herring <robh+dt@kernel.org>
>> ---
>>  drivers/pci/of.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>> 
>> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
>> index 85dcb7097da4..1e45186a5715 100644
>> --- a/drivers/pci/of.c
>> +++ b/drivers/pci/of.c
>> @@ -353,6 +353,14 @@ static int devm_of_pci_get_host_bridge_resources(struct device *dev,
>>  				dev_warn(dev, "More than one I/O resource converted for %pOF. CPU base address for old range lost!\n",
>>  					 dev_node);
>>  			*io_base = range.cpu_addr;
>> +		} else if (resource_type(res) == IORESOURCE_MEM) {
>> +			if (!(res->flags & IORESOURCE_PREFETCH)) {
>> +				if (res->flags & IORESOURCE_MEM_64)
>> +					if (!upper_32_bits(range.pci_addr + range.size - 1)) {
>> +						dev_warn(dev, "Clearing 64-bit flag for non-prefetchable memory below 4GB\n");
>> +						res->flags &= ~IORESOURCE_MEM_64;
>> +					}
>> +			}
>
> Why do we need to check IORESOURCE_PREFETCH, IORESOURCE_MEM_64, and
> upper_32_bits()?  If I understand this correctly, prior to
> 9d57e61bf723, IORESOURCE_MEM_64 was *never* set here.  Isn't something
> like this sufficient?
>
>   } else if (resource_type(res) == IORESOURCE_MEM) {
>     res->flags &= ~IORESOURCE_MEM_64;
>   }

Based on the discussion in the original thread[0], I was working with
the assumption that we don't want to lose the IORESOURCE_MEM_64 flag
other than in the problem scenario, i.e., non-prefetchable memory below
4GB.

You suggestion is simpler and also solves the issue by effectively
reverting the impact of 9d57e61bf723 on BAR allocation. If there are no
objections I will take this approach for the next update.

To aid future readers I will also add the following comment -

    /*
     * PCI allocation cannot correctly allocate 32-bit non-prefetchable BAR
     * in host bridge windows marked as 64-bit.
     */

> I'm not sure we need a warning either.  We didn't warn before
> 9d57e61bf723, and there's nothing the user needs to do anyway.

The warning was a nudge (probably too subtle) to get the user to upgrade
their DT to drop the 64-bit marker on the host bridge window. With your
suggestion, the DT change is not needed anymore - though it may still be
worth dropping the 64-bit marker.

Thanks,
Punit

[0] https://lore.kernel.org/linux-pci/CAMj1kXGF_JmuZ+rRA55-NrTQ6f20fhcHc=62AGJ71eHNU8AoBQ@mail.gmail.com/
Bjorn Helgaas June 10, 2021, 6:28 p.m. UTC | #3
On Thu, Jun 10, 2021 at 10:34:56PM +0900, Punit Agrawal wrote:
> Hi Bjorn,
> 
> Bjorn Helgaas <helgaas@kernel.org> writes:
> 
> > [+cc Leonardo]
> >
> > On Mon, Jun 07, 2021 at 08:28:53PM +0900, Punit Agrawal wrote:
> >> Some host bridges advertise non-prefetchable memory windows that are
> >> entirely located below 4GB but are marked as 64-bit address memory.
> >> 
> >> Since commit 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
> >> flags for 64-bit memory addresses"), the OF PCI range parser takes a
> >> stricter view and treats 64-bit address ranges as advertised while
> >> before such ranges were treated as 32-bit.
> >> 
> >> A PCI root port modelled as a PCI-to-PCI bridge cannot forward 64-bit
> >> non-prefetchable memory ranges. As a result, the change in behaviour
> >> due to the commit causes failure to allocate 32-bit BAR from a 64-bit
> >> non-prefetchable window.
> >> 
> >> In order to not break platforms where non-prefetchable memory ranges
> >> lie entirely below 4GB, clear the 64-bit flag.
> >
> > I don't think we should care about the address width DT supplies for a
> > host bridge window.  Prior to 9d57e61bf723, I don't think we *did*
> > care because of_bus_pci_get_flags() threw away that information.
> >
> > My proposal for a commit log, including information about the problem
> > report and a "Fixes:" tag:
> >
> >   Alexandru and Qu reported this resource allocation failure on
> >   ROCKPro64 v2 and ROCK Pi 4B, both based on the RK3399:
> >
> >     pci_bus 0000:00: root bus resource [mem 0xfa000000-0xfbdfffff 64bit]
> >     pci 0000:00:00.0: PCI bridge to [bus 01]
> >     pci 0000:00:00.0: BAR 14: no space for [mem size 0x00100000]
> >     pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit]
> >
> >   "BAR 14" is the PCI bridge's 32-bit non-prefetchable window, and our
> >   PCI allocation code isn't smart enough to allocate it in a host
> >   bridge window marked as 64-bit, even though this should work fine.
> >
> >   A DT host bridge description includes the windows from the CPU
> >   address space to the PCI bus space.  On a few architectures
> >   (microblaze, powerpc, sparc), the DT may also describe PCI devices
> >   themselves, including their BARs.
> >
> >   Before 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource
> >   flags for 64-bit memory addresses"), of_bus_pci_get_flags() ignored
> >   the fact that some DT addresses described 64-bit windows and BARs.
> >   That was a problem because the virtio virtual NIC has a 32-bit BAR
> >   and a 64-bit BAR, and the driver couldn't distinguish them.
> 
> Many thanks for demystifying the motivation for 9d57e61bf723. Not being
> familiar with the usage of DT to describe PCI devices I was missing this
> context.

The use of DT to describe PCI devices is a mystery to me, too.  I'm
guessing this is related to hypervisors that don't fully virtualize
PCI devices.  

> >   9d57e61bf723 set IORESOURCE_MEM_64 for those 64-bit DT ranges, which
> >   fixed the virtio driver.  But it also set IORESOURCE_MEM_64 for host
> >   bridge windows, which exposed the fact that the PCI allocator isn't
> >   smart enough to put 32-bit resources in those 64-bit windows.
> >
> >   Clear IORESOURCE_MEM_64 from host bridge windows since we don't need
> >   that information.
> >
> >   Fixes: 9d57e61bf723 ("of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses")
> >   Reported-at: https://lore.kernel.org/lkml/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com/
> >   Reported-by: Alexandru Elisei <alexandru.elisei@arm.com>
> >   Reported-by: Qu Wenruo <wqu@suse.com>
> 
> Thank you for commit log - without all the pieces I was struggling to
> clearly describe the details. And I missed the appropriate tags as
> well. I've updated the commit log based on your suggestion.
> 
> >> Suggested-by: Ard Biesheuvel <ardb@kernel.org>
> >> Link: https://lore.kernel.org/r/7a1e2ebc-f7d8-8431-d844-41a9c36a8911@arm.com
> >> Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
> >> Tested-by: Alexandru Elisei <alexandru.elisei@arm.com>
> >> Cc: Bjorn Helgaas <bhelgaas@google.com>
> >> Cc: Rob Herring <robh+dt@kernel.org>
> >> ---
> >>  drivers/pci/of.c | 8 ++++++++
> >>  1 file changed, 8 insertions(+)
> >> 
> >> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> >> index 85dcb7097da4..1e45186a5715 100644
> >> --- a/drivers/pci/of.c
> >> +++ b/drivers/pci/of.c
> >> @@ -353,6 +353,14 @@ static int devm_of_pci_get_host_bridge_resources(struct device *dev,
> >>  				dev_warn(dev, "More than one I/O resource converted for %pOF. CPU base address for old range lost!\n",
> >>  					 dev_node);
> >>  			*io_base = range.cpu_addr;
> >> +		} else if (resource_type(res) == IORESOURCE_MEM) {
> >> +			if (!(res->flags & IORESOURCE_PREFETCH)) {
> >> +				if (res->flags & IORESOURCE_MEM_64)
> >> +					if (!upper_32_bits(range.pci_addr + range.size - 1)) {
> >> +						dev_warn(dev, "Clearing 64-bit flag for non-prefetchable memory below 4GB\n");
> >> +						res->flags &= ~IORESOURCE_MEM_64;
> >> +					}
> >> +			}
> >
> > Why do we need to check IORESOURCE_PREFETCH, IORESOURCE_MEM_64, and
> > upper_32_bits()?  If I understand this correctly, prior to
> > 9d57e61bf723, IORESOURCE_MEM_64 was *never* set here.  Isn't something
> > like this sufficient?
> >
> >   } else if (resource_type(res) == IORESOURCE_MEM) {
> >     res->flags &= ~IORESOURCE_MEM_64;
> >   }
> 
> Based on the discussion in the original thread[0], I was working with
> the assumption that we don't want to lose the IORESOURCE_MEM_64 flag
> other than in the problem scenario, i.e., non-prefetchable memory below
> 4GB.
> 
> You suggestion is simpler and also solves the issue by effectively
> reverting the impact of 9d57e61bf723 on BAR allocation. If there are no
> objections I will take this approach for the next update.
> 
> To aid future readers I will also add the following comment -
> 
>     /*
>      * PCI allocation cannot correctly allocate 32-bit non-prefetchable BAR
>      * in host bridge windows marked as 64-bit.
>      */
> 
> > I'm not sure we need a warning either.  We didn't warn before
> > 9d57e61bf723, and there's nothing the user needs to do anyway.
> 
> The warning was a nudge (probably too subtle) to get the user to upgrade
> their DT to drop the 64-bit marker on the host bridge window. With your
> suggestion, the DT change is not needed anymore - though it may still be
> worth dropping the 64-bit marker.

I'm certainly not a DT expert, and Rob would know better.

The doc I'm looking at ([1]), says in sec 2.2.1.1 that for an address
in 32-bit-address Memory Space, the high-order address bits "hh...hh
must be zero" and only the 32 bits in "ll...ll" are usable.

That suggests to me that the DT probably *should* use 64-bit-address
Memory Space for things that don't fit in 32 bits.  But when we use
such an address for PCI host bridge windows, I don't think the
distinction is useful, so I think we should just drop the 64-bit
indication silently.

> [0] https://lore.kernel.org/linux-pci/CAMj1kXGF_JmuZ+rRA55-NrTQ6f20fhcHc=62AGJ71eHNU8AoBQ@mail.gmail.com/

[1] PCI Bus Binding to: IEEE Std 1275-1994 Standard for Boot
(Initialization Configuration) Firmware, Revision 2.1 [this is
ancient, and I would welcome a pointer to something better]
diff mbox series

Patch

diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 85dcb7097da4..1e45186a5715 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -353,6 +353,14 @@  static int devm_of_pci_get_host_bridge_resources(struct device *dev,
 				dev_warn(dev, "More than one I/O resource converted for %pOF. CPU base address for old range lost!\n",
 					 dev_node);
 			*io_base = range.cpu_addr;
+		} else if (resource_type(res) == IORESOURCE_MEM) {
+			if (!(res->flags & IORESOURCE_PREFETCH)) {
+				if (res->flags & IORESOURCE_MEM_64)
+					if (!upper_32_bits(range.pci_addr + range.size - 1)) {
+						dev_warn(dev, "Clearing 64-bit flag for non-prefetchable memory below 4GB\n");
+						res->flags &= ~IORESOURCE_MEM_64;
+					}
+			}
 		}
 
 		pci_add_resource_offset(resources, res,	res->start - range.pci_addr);